Lucid Imagination

Secondary links

  • Contact Us
  • Log out
  • Downloads
  • Solutions
    • Partners |
    • Blog |
    • Software |
    • Services |
    • Training |
    • Case Studies |
    • Webinars |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Docs |
    • Downloads |
    • Whitepapers |
    • Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Options

  • results per page

Clear all facets

  • Project clear projects

  • Source clear sources

  • Author clear authors

Search Results for

Results loading...

Found 36,204 results in 0.143 seconds. Displaying page 7 of 3,621, sorted by

  1. [nutch-user] Re: obvious duplicates with different hash-values

    Sent 2010-08-23 by Scott Gonyea <me@...>

    Were I to guess, the md5 hash isn't a hash of the content but, rather, of the CrawlDatum object that Nutch stores. Scott On Mon, Aug 23, 2010 at 9:11 AM, Andre Pautz wrote: > Dear list, > > i have a problem with removing duplicates from my nutch index. If i > understood it rig...

  2. [nutch-user] obvious duplicates with different hash-values

    Sent 2010-08-23 by Andre Pautz <a-pautz@...>

    Dear list, i have a problem with removing duplicates from my nutch index. If i understood it right, then the dedup option should do the work for me, i.e. remove entries with the same URL or same content (MD5 hash). But unfortunately it doesn't. The strange thing is, that if i check the index wi...

  3. [nutch-user] Re: nutch plugin to filter indexing by content!

    Sent 2010-08-23 by Scott Gonyea <me@...>

    Not to my knowledge. You may want to look for where the "regex-normalize.xml" is being used and can write a plugin there. It would be useful, certainly. I'm looking to eventually do the same, but at index time. Scott On Mon, Aug 23, 2010 at 8:11 AM, Ahmad Al-Amri wrote: ...

  4. [nutch-user] nutch plugin to filter indexing by content!

    Sent 2010-08-23 by Ahmad Al-Amri <amri_jo@...>

    hello; I want to check if the web-page contains certain words; and DON'T index it - while crawling -, and to prevent the url to added to my carwldb ... I just want to ask if there is a plug-in to do such a thing or similar to it; to start from it. thank you;

  5. [nutch-user] Re: Crawl atom, rss, xml .... I need any plugin extra?

    Sent 2010-08-23 by Israel <wegols2@...>

    Great Volly .. thank you very much, saludos...Israel

  6. [nutch-user] RE: Tellling Nutch to skip certain Url

    Sent 2010-08-23 by "Nemani, Raj" <Raj.Nemani@...>

    They are intranet Urls. So I went with a generic description. They are not avaialble outside I start with http://Mydomain.com/guidance/wiki/index.php/sylebook I think +^http://Mydomain\.com/guidance/ will work for me. Thank you so much for such a detailed explanation. Thanks again Raj ---...

  7. [nutch-user] Re: Tellling Nutch to skip certain Url

    Sent 2010-08-23 by Volli <illov@...>

    I can't identify your urls. "http://mysite . Mydomain.com/guidance/wiki/index.php/sylebook." ?? "http://mysite . Mydomain.com/guidance/........" ???? What's the url you start with. Is it http://Mydomain.com/guidance/ or http://Mydomain.com/guidance/wiki/index...

  8. [nutch-user] Tellling Nutch to skip certain Url

    Sent 2010-08-22 by "Nemani, Raj" <Raj.Nemani@...>

    All, I am currently using Nutch to crawl an intranet site. I start the crawl with one seed url as shown below. http://mysite . Mydomain.com/guidance/wiki/index.php/sylebook. What I would like to do is to tell Nutch to skip all that URLS that do not conform to the fol...

  9. [nutch-user] Re: Crawl atom, rss, xml .... I need any plugin extra?

    Sent 2010-08-22 by Volli <illov@...>

    Addendum to my last post: After, i've read my own post: All crawls worked with parser parse-html. I think, you don't need to update Nutch. If not: ==>TODO1<== In conf/parse-plugins.xml: --FIND:

  10. [nutch-user] Re: Crawl atom, rss, xml .... I need any plugin extra?

    Sent 2010-08-22 by Volli <illov@...>

    I use Nutch version 1.1 (Released 06 June 2010). I didn't install any additional plugin! I think your xml-plugin at NUTCH-185 is outdated: "Resolution:Won't Fix" and "Affects Version/s: 0.7.2, 0.8, 0.8.1". Check your nutch version (and update). Check in "nutch-site.xml" at "plugin.inc...

  1. <<
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. >>

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Logout

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.