Found 36,204 results in 0.133 seconds. Displaying page 3 of 3,621, sorted by
Sent 2010-09-01 by jitendra rajput <jeet.loves@...>
Hi,
I have gone through the tutorial about writing plugin in Nutch source code
itself. But I want to write a nutch plugin in my own package with Nutch jar
in its build path. Is it possible to do so.
Can any one lead me to right direction for same. Any help would be
appreciated.
--
Thanks and...
Sent 2010-09-01 by Alex McLintock <alex.mclintock@...>
This should really be a user type question, not a dev question. But
what the heck.
The first thing which comes to mind is to do the search yourself and
provide the results of that search as seed pages.
But since you asked on the dev mailing list, you could possibly write
something which actuall...
Sent 2010-09-01 by Shanthoosh PV <shanthoosh@...>
Hi ,
I want to crawl a result obtained based upon a user
defined keyword search in a search engine . Is it possible to do it in nutch
. Please provide useful insights , i tried searching in this forum and
google but found nothing helpful .
The user may p...
Sent 2010-08-31 by AJ Chen <ajchen@...>
Thanks for suggesting multiple segments approach - it's the way to go for
further increasing crawling throughput. I tried the -maxNumSegments 3
option in local mode, but it did not generate 3 segments. Does the option
work? It may be only work in distributed mode.
I also observe that, when fet...
Sent 2010-08-31 by Jitendra <jeet.loves@...>
Thanks a ton volli.
I wasted 2 days trying to figure this out, never noticed
crawl-urifilter.txt
also contains regex expressions for filtering urls.
Volli wrote:
>
> Did you try already to switch off the regexp in
> crawl-urlfilter.txt?
>
> if you use
> bin/nutch crawl...
> for crawling cra...
Sent 2010-08-31 by Volli <illov@...>
Did you try already to switch off the regexp in
crawl-urlfilter.txt?
if you use
bin/nutch crawl...
for crawling crawl-urlfilter.txt must be changed.
compare other lines, too. see "# skip everything else" and
"# accept anything else"
Am 31.08.2010 10:32, schrieb jitendra rajput:
> Hi,
>
> I a...
Sent 2010-08-31 by jitendra rajput <jeet.loves@...>
Hi,
I am trying to write XpathBasedLinkExtractor which extracts links out of
html page using xpaths.
But all the extracted links which contains characters like [? , = ] are
being filtered out. I am not able to nail it down where it is happening.
They are not going into segments.
I have also comm...
Sent 2010-08-30 by Andrzej Bialecki <ab@...>
On 2010-08-30 12:21, Otis Gospodnetic wrote:
> Hello peeps,
>
> We've created a patch for Tika and got some good and constructive feedback (see
> https://issues.apache.org/jira/browse/TIKA-488 ).
>
> Should we follow the same functionality pattern for nutch.apache.org as seen in
> TIKA-488?
Sure...
Sent 2010-08-30 by Otis Gospodnetic <ogjunk-nutch@...>
Hello peeps,
We've created a patch for Tika and got some good and constructive feedback (see
https://issues.apache.org/jira/browse/TIKA-488 ).
Should we follow the same functionality pattern for nutch.apache.org as seen in
TIKA-488?
Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr ...
Sent 2010-08-28 by Savannah Beckett <savannah_beckett30@...>
one more thing, in code CustomFieldQueryFilter.java, it doesn't loop through
same key more than once. It looks like it never expect more than one custom
field in the xml.
________________________________
From: Savannah Beckett
To: user@nutch.apache.org
S...