Lucid Imagination

Secondary links

  • Contact Us
  • Log out
  • Downloads
  • Solutions
    • Partners |
    • Blog |
    • Software |
    • Services |
    • Training |
    • Case Studies |
    • Webinars |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Docs |
    • Downloads |
    • Whitepapers |
    • Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Options

  • results per page

Clear all facets

  • Project clear projects

  • Source clear sources

  • Author clear authors

Search Results for

Results loading...

Found 36,202 results in 0.014 seconds. Displaying page 8 of 3,621, sorted by

  1. [nutch-user] Re: Crawl atom, rss, xml .... I need any plugin extra?

    Sent 2010-08-21 by Israel <wegols2@...>

    2010/8/21 Israel > > Thanks for your help, plese help me with this > > Hello, i download the parse plugin from: " > https://issues.apache.org/jira/browse/NUTCH-185", and i don't know where > put this: > > >> Added to "parse-plugins.xml" >> >> ...

  2. [nutch-user] Re: Crawl atom, rss, xml .... I need any plugin extra?

    Sent 2010-08-21 by Israel <wegols2@...>

    Thanks for your help, plese help me with this Hello, i download the parse plugin from: " https://issues.apache.org/jira/browse/NUTCH-185", and i don't know where put this: > > Added to "parse-plugins.xml" > > > > > >

    [nutch-user] Re: Crawl atom, rss, xml .... I need any plugin extra?

    Sent 2010-08-20 by Volli <illov@...>

    Nutch 1.1. I tested just with "http://cnx.org/lenses/ccotp/endorsements/atom" I added to property "plugin.includes" in "nutch-site.xml" "...parse-(text|html|js|tika|pdf|rss)|feed|..." (see added "rss" and "feed"; I don't know which one did it). Added to "parse-plugins.xml"

    [nutch-user] Crawl atom, rss, xml .... I need any plugin extra?

    Sent 2010-08-20 by Israel <wegols2@...>

    Hello, I tried to indexer these pages that use xml, rss, atom or inclusive rdf or the respective format ..... but errors occur, I download the "parse xml " plugin but I don't how to use this. I index this pages: http://cnx.org/lenses/ccotp/endorsements/atom http://ocw.nd.edu/courselist/rss http...

  3. [nutch-user] Re: How to configure nutch crawl-and-site urlfilter

    Sent 2010-08-20 by Volli <illov@...>

    I found this post. I didn't read it in detail. So, just a maybe. http://www.mail-archive.com/nutch-user@lucene.apache.org/msg12429.html Am 20.08.2010 20:08, schrieb Israel: > Hello, anyone knows how I can do a search on these rss: > > http://ocw.mit.edu/rss/all/mit-allcourses-1.xml > > how do I...

  4. [nutch-user] Re: Deep crawl with subdomains

    Sent 2010-08-20 by Scott Gonyea <scott@...>

    I haven't really focused my time on subdomains. I think I saw some in my crawl data, but can't confirm ATM. One question is, are you putting "www." in your injected urls... Or just http://[domain]? If that doesnt make a difference, then it would seem to me that the regex handler should be the ta...

  5. [nutch-user] Re: Deep crawl with subdomains

    Sent 2010-08-20 by AJ Chen <ajchen@...>

    It may seem slow if you put 5000 domains or paths in regex-urlfilter. But, after you try it, you may find the performance acceptable. It works for me anyway. -aj On Fri, Aug 20, 2010 at 12:12 PM, Sonal Goyal wrote: > Hi, > > I have a list of about 5000 URLs which I need...

  6. [nutch-user] Deep crawl with subdomains

    Sent 2010-08-20 by Sonal Goyal <sonalgoyal4@...>

    Hi, I have a list of about 5000 URLs which I need to crawl and fetch using Nutch. I want to do a very deep crawl on each and I want subdomains, but I dont want external links. If I set db.ignore.external.links, I dont get the subdomains. So I cant use that. If I set the domain in regex-urlfilter...

  7. [nutch-user] How to configure nutch crawl-and-site urlfilter

    Sent 2010-08-20 by Israel <wegols2@...>

    Hello, anyone knows how I can do a search on these rss: http://ocw.mit.edu/rss/all/mit-allcourses-1.xml how do I configure the "crawl-urlfilter" and if I should add plugins to "nutch-site."

  8. [nutch-user] Configuration, nutch-default.xml, property crawl.gen.delay with default value 604800000

    Sent 2010-08-20 by Volli <illov@...>

    Hello, this is my first message to nutch mailing list. I hope I send it to right receiver. In nutch-1.1 I checked nutch-default.xml for new properties. There I found "crawl.gen.delay" with default value "604800000". Description says "This value, expressed in days ... Default value of this is ...

  1. <<
  2. 3
  3. 4
  4. 5
  5. 6
  6. 7
  7. 8
  8. 9
  9. 10
  10. 11
  11. 12
  12. >>

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Logout

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.