Found 29,425 results in 0.1 seconds. Displaying page 10 of 2,943, sorted by
Sent 2010-02-25 by Andrzej Bialecki <ab@...>
On 2010-02-24 17:34, Pedro Bezunartea López wrote:
> Hi Ashley,
>
> Hi,
>> I'm looking to reproduce program analysis results based on Nutch v0.4. I
>> realize this is a very old release, but is it possible to obtain the source
>> from somewhere? I see some of the classes I'm looking for in v0.7,...
Sent 2010-02-25 by "Andreas P. Koenzen" <akoenzen@...>
Replace it with this: -[@!*]
That's it...
Best regards,
---
Andreas P. Koenzen
On 25/02/2010, at 03:06 a.m., Ian M. Evans wrote:
> I suck at regex and in keeping with the Olympic spirit, I probably
> suck
> at giant slalom too.
>
> In the regex-urlfilter.txt there's the suggested probable ...
Sent 2010-02-25 by MilleBii <millebii@...>
You can add a specific rule before that exclusion rule
Something like :
+.*/?page=.*
2010/2/25, Ian M. Evans :
> I suck at regex and in keeping with the Olympic spirit, I probably suck
> at giant slalom too.
>
> In the regex-urlfilter.txt there's the suggested probable q...
Sent 2010-02-25 by Bradford Stephens <bradfordstephens@...>
Thanks for coming, everyone! We had around 25 people. A *huge*
success, for Seattle. And a big thanks to 10gen for sending Richard.
Can't wait to see you all next month.
On Wed, Feb 24, 2010 at 2:15 PM, Bradford Stephens
wrote:
> The Seattle Hadoop/Scalability/NoSQL...
Sent 2010-02-25 by "Ian M. Evans" <ianevans@...>
I suck at regex and in keeping with the Olympic spirit, I probably suck
at giant slalom too.
In the regex-urlfilter.txt there's the suggested probable queries
exclude of:
-[?*!@=]
My only problem is that there's a couple of areas of the site that use,
for example, ?page=2 for paging through th...
Sent 2010-02-24 by Yves Petinot <yves@...>
Hi,
I was wondering if someone else on the list has been experiencing an
issue similar to the one below. I'm running 2 independent crawls on a
single hadoop cluster and am regularly getting "reduce copier failed"
errors. Most of the time Nutch is able to recover from these errors, but
every ...
Sent 2010-02-24 by Bradford Stephens <bradfordstephens@...>
The Seattle Hadoop/Scalability/NoSQL (yeah, we vary the title) meetup
is tonight! We're going to have a guest speaker from MongoDB :)
As always, it's at the University of Washington, Allen Computer
Science building, Room 303 at 6:45pm. You can find a map here:
http://www.washington.edu/home/maps...
Sent 2010-02-24 by Magnús Skúlason <maggias@...>
Hi,
This is actually very easy, just create a indexing plugging, analyse the url
format and return null from the indexing pluggin if you don't want to index
it.
best regards,
Magnus
On Wed, Feb 24, 2010 at 6:09 PM, Steven Wichers wrote:
> On some of the sites I want to ind...
Sent 2010-02-24 by Steven Wichers <steven@...>
On some of the sites I want to index with nutch, there are only
specific types of pages I would like to be searchable. I need a way to
be able to crawl these sites, but only index pages that match a
certain regular expression.
ex:
www.example.com/browse/ finds links in the form of
www.example.c...
Sent 2010-02-24 by Pedro Bezunartea López <pedro@...>
Hi Ashley,
Hi,
> I'm looking to reproduce program analysis results based on Nutch v0.4. I
> realize this is a very old release, but is it possible to obtain the source
> from somewhere? I see some of the classes I'm looking for in v0.7, but I
> need the older version to confirm it.
> Thanks,
> A...