Enterprise Search support for Apache Lucene and Solr by Lucid Imagination

Secondary links

  • Contact Us
  • Log in
  • Downloads
  • Solutions
    • Software |
    • Services |
    • Training |
    • White Papers & Case Studies |
    • Webinars & Events |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Documentation |
    • Downloads |
    • Webcasts & Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Options

  • results per page

Clear all facets

  • Project clear projects

  • Source clear sources

  • Author clear authors

Search Results for

Results loading...

Found 29,425 results in 0.1 seconds. Displaying page 10 of 2,943, sorted by

  1. [nutch-user] Re: Nutch v0.4

    Sent 2010-02-25 by Andrzej Bialecki <ab@...>

    On 2010-02-24 17:34, Pedro Bezunartea López wrote: > Hi Ashley, > > Hi, >> I'm looking to reproduce program analysis results based on Nutch v0.4. I >> realize this is a very old release, but is it possible to obtain the source >> from somewhere? I see some of the classes I'm looking for in v0.7,...

  2. [nutch-user] Re: regex-urlfilter.txt and paging variables

    Sent 2010-02-25 by "Andreas P. Koenzen" <akoenzen@...>

    Replace it with this: -[@!*] That's it... Best regards, --- Andreas P. Koenzen On 25/02/2010, at 03:06 a.m., Ian M. Evans wrote: > I suck at regex and in keeping with the Olympic spirit, I probably > suck > at giant slalom too. > > In the regex-urlfilter.txt there's the suggested probable ...

  3. [nutch-user] Re: regex-urlfilter.txt and paging variables

    Sent 2010-02-25 by MilleBii <millebii@...>

    You can add a specific rule before that exclusion rule Something like : +.*/?page=.* 2010/2/25, Ian M. Evans : > I suck at regex and in keeping with the Olympic spirit, I probably suck > at giant slalom too. > > In the regex-urlfilter.txt there's the suggested probable q...

  4. [nutch-user] Re: Seattle Hadoop/Scalability/NoSQL Meetup Tonight!

    Sent 2010-02-25 by Bradford Stephens <bradfordstephens@...>

    Thanks for coming, everyone! We had around 25 people. A *huge* success, for Seattle. And a big thanks to 10gen for sending Richard. Can't wait to see you all next month. On Wed, Feb 24, 2010 at 2:15 PM, Bradford Stephens wrote: > The Seattle Hadoop/Scalability/NoSQL...

  5. [nutch-user] regex-urlfilter.txt and paging variables

    Sent 2010-02-25 by "Ian M. Evans" <ianevans@...>

    I suck at regex and in keeping with the Olympic spirit, I probably suck at giant slalom too. In the regex-urlfilter.txt there's the suggested probable queries exclude of: -[?*!@=] My only problem is that there's a couple of areas of the site that use, for example, ?page=2 for paging through th...

  6. [nutch-user] reduce copier failed error at various stages of nutch processing

    Sent 2010-02-24 by Yves Petinot <yves@...>

    Hi, I was wondering if someone else on the list has been experiencing an issue similar to the one below. I'm running 2 independent crawls on a single hadoop cluster and am regularly getting "reduce copier failed" errors. Most of the time Nutch is able to recover from these errors, but every ...

  7. [nutch-user] Seattle Hadoop/Scalability/NoSQL Meetup Tonight!

    Sent 2010-02-24 by Bradford Stephens <bradfordstephens@...>

    The Seattle Hadoop/Scalability/NoSQL (yeah, we vary the title) meetup is tonight! We're going to have a guest speaker from MongoDB :) As always, it's at the University of Washington, Allen Computer Science building, Room 303 at 6:45pm. You can find a map here: http://www.washington.edu/home/maps...

  8. [nutch-user] Re: Crawling site, but only indexing certain pages

    Sent 2010-02-24 by Magnús Skúlason <maggias@...>

    Hi, This is actually very easy, just create a indexing plugging, analyse the url format and return null from the indexing pluggin if you don't want to index it. best regards, Magnus On Wed, Feb 24, 2010 at 6:09 PM, Steven Wichers wrote: > On some of the sites I want to ind...

  9. [nutch-user] Crawling site, but only indexing certain pages

    Sent 2010-02-24 by Steven Wichers <steven@...>

    On some of the sites I want to index with nutch, there are only specific types of pages I would like to be searchable. I need a way to be able to crawl these sites, but only index pages that match a certain regular expression. ex: www.example.com/browse/ finds links in the form of www.example.c...

  10. [nutch-user] Re: Nutch v0.4

    Sent 2010-02-24 by Pedro Bezunartea López <pedro@...>

    Hi Ashley, Hi, > I'm looking to reproduce program analysis results based on Nutch v0.4. I > realize this is a very old release, but is it possible to obtain the source > from somewhere? I see some of the classes I'm looking for in v0.7, but I > need the older version to confirm it. > Thanks, > A...

  1. <<
  2. 5
  3. 6
  4. 7
  5. 8
  6. 9
  7. 10
  8. 11
  9. 12
  10. 13
  11. 14
  12. >>

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Admin

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.