Found 36,204 results in 0.022 seconds. Displaying page 9 of 3,621, sorted by
Sent 2010-08-20 by Israel <wegols2@...>
Hello, anyone knows how I can do a search on these rss:
http://ocw.mit.edu/rss/all/mit-allcourses-1.xml
how do I configure the "crawl-urlfilter" and if I should add plugins to
"nutch-site."
Sent 2010-08-20 by Volli <illov@...>
Hello, this is my first message to nutch mailing list. I
hope I send it to right receiver.
In nutch-1.1 I checked nutch-default.xml for new properties.
There I found "crawl.gen.delay" with default value "604800000".
Description says "This value, expressed in days ... Default
value of this is ...
Sent 2010-08-20 by Roger Marin <rsmaniak@...>
Ok so now the plugin is working, it changes the analyzer to the
SnowballAnalyzer but when I parse the query some letters end up
being stripped, like for instance if I search for "exchanges" it gets turned
into "exchang" and of course not getting any results, what could be the
cause of this? as fa...
Sent 2010-08-20 by Gonzalo Aguilar Delgado <gaguilar@...>
Hi Alex,
I will answer inline so we can follow comments...
On jue, 2010-08-19 at 19:21 +0100, Alex McLintock wrote:
Hello Gonzalo,
>
> Did you mean to post to the dev list?
> Yes! Users normally don't know what to implement if missing
features...
> Further comments inline
>
> On 19 Augus...
Sent 2010-08-19 by Roger Marin <rsmaniak@...>
Hello,
Is it possible to change the lucene analyzer that nutch uses by default? I
would like to use the snowball analyzer to search and crawl, I tried
creating a plugin based on the analysis-fr and alaysis-dr plugins but it
didn't work, not sure if i need to create a plugin for querying too.
I w...
Sent 2010-08-19 by Israel <wegols2@...>
Hi Peter, I read your tutorial for nutch installation, I installed it and
everything works great ... but I have a big doubt.
When I run the crawler, for example in the url directory I have a *. txt in
the interior contains:
http://www.opentechlearning.com/
And inside the folder 'conf' there are...
Sent 2010-08-19 by Israel <wegols2@...>
i put this:
+^http://cnx.org/lenses/ccotp/endorsements/atom but
but when i do the search....nothing appears
Hi Peter, I read your tutorial for nutch installation, I installed it and
everything works great ... but I have a big doubt.
When I run the crawler, for example in the url directory I ha...
Sent 2010-08-19 by "Nemani, Raj" <Raj.Nemani@...>
Hi all,
I am using the following script to do my re-crawl. This is basically a
slightly modified version of the script that is found here.
http://wiki.apache.org/nutch/Crawl
I have a small site that I would like to crawl using this script may be
3 times a day on a windows server by schedulin...
Sent 2010-08-19 by AJ Chen <ajchen@...>
I found more disk space is required during indexing. So, for slave node with
limited space, building smaller index, e.g. 2M pages instead of 10M pages,
can avoid the disk space error.
A related question: after crawling/indexing for some time, each slave node
accumulate lots of files (under hdfs/...
Sent 2010-08-19 by Israel <wegols2@...>
Hello, anyone know if nutch working with semantic technologies and how.
Thanks