Lucid Imagination

Secondary links

  • Contact Us
  • Log out
  • Downloads
  • Solutions
    • Partners |
    • Blog |
    • Software |
    • Services |
    • Training |
    • Case Studies |
    • Webinars |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Docs |
    • Downloads |
    • Whitepapers |
    • Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Options

  • results per page

Clear all facets

  • Project clear projects

  • Source clear sources

  • Author clear authors

Search Results for

Results loading...

Found 36,204 results in 0.019 seconds. Displaying page 2 of 3,621, sorted by

  1. [nutch-user] Re: Nutch crawl failure

    Sent 2010-09-02 by Markus Jelsma <markus.jelsma@...>

    http://wiki.apache.org/nutch/FAQ#How_can_I_recover_an_aborted_fetch_process.3F On Thursday 02 September 2010 11:25:51 Nayanish Hinge wrote: > Hi, > I have a doubt, not sure if anybody has already thought about it > What if nutch crawler fails during its crawling cycles, could we restart > the c...

  2. [nutch-user] depth information not being available in crawl datum

    Sent 2010-09-02 by Nayanish Hinge <nayanish.hinge@...>

    Hi, I have a specific use case where I need to know at which level (depth) I fetched the current url. Currently the depth could be figured out from the for loop index in the crawl.java. But my use case necessitate me to have this information stored in crawl-datum. Currently Nutch does not have an...

  3. [nutch-user] Nutch crawl failure

    Sent 2010-09-02 by Nayanish Hinge <nayanish.hinge@...>

    Hi, I have a doubt, not sure if anybody has already thought about it What if nutch crawler fails during its crawling cycles, could we restart the crawling right from where we left? I mean, starting with only the unfetched urls. Thanks -- Nayanish Hyderabad

  4. [nutch-user] RE: Why do nutch has Content Parsing in two places

    Sent 2010-09-02 by Markus Jelsma <markus.jelsma@...>

    In small crawls, you could parse the documentright away. For large crawls, however, there may not be enough resources to fetch and parse at the same time.   -----Original message----- From: Nayanish Hinge Sent: Thu 02-09-2010 07:39 To: user@nutch.apache.org; Subject: ...

  5. [nutch-user] Why do nutch has Content Parsing in two places

    Sent 2010-09-02 by Nayanish Hinge <nayanish.hinge@...>

    Hi, I was wondering, why nutch has an option of parsing 1. right within the fetcher and 2. also as a separate map-reduce job In Crawl.java, There is a separate step for crawling. But also based on "fetcher.parse" property in nutch-default.xml, Fetcher will also parse the content. Thanks -- Nay...

  6. [nutch-user] Nutch redirects.

    Sent 2010-09-01 by Mark Stephenson <mstephen@...>

    Hi, I am new to Nutch and I'm trying to understand how it handles redirects. Let's say I want to fetch the following article from the New York Times: http://www.nytimes.com/2010/08/30/opinion/30mon1.html That is the only URL I put in my 'urls' directory. Then I issue the following comm...

  7. [nutch-user] Selective Fetching and Notifying When Files Have Been Modifed Since Last Fetch

    Sent 2010-09-01 by "onlinespending@...>

    Hi, I'd like to use Nutch to crawl a very limited set of pages. But as it's crawling I'd like for it to only fetch particular pages and files that match certain criteria. I'd also like that I am somehow alerted when any of these fetched files have been modified (modify date of the file or ...

  8. [nutch-user] Re: performance for small cluster

    Sent 2010-09-01 by AJ Chen <ajchen@...>

    in distributed mode, "generate -topN 1000000 -maxNumSegments 3" creates 3 segments, but the size is very uneven: 1.7M, 0.8M, 0.5M. I also tried fetcher.timelimit.mins=240 in distributed mode. but the fetcher did not stop after 4 hours. any idea? -aj On Tue, Aug 31, 2010 at 4:24 PM, AJ Chen

  9. [nutch-user] Nutch 1.1 Crawl is slow,hangs and aborts eventually

    Sent 2010-09-01 by "Nemani, Raj" <Raj.Nemani@...>

    All, I am crawling a site that is heavy in rtf, txt and pdf documents in addition to pages that embed a lot of images. I am using Nutch 1.1 and running on Windows 7. I am seeing the following errors in my hadoop logs. 2010-09-01 15:01:26,509 INFO parse.ParserFactory - The parsing p...

  10. [nutch-user] Re: Write plugin in my own package with Nutch as a jar

    Sent 2010-09-01 by Volli <illov@...>

    in my bookmarks I found this: http://efreedom.com/Question/1-3310050/ very short, but author is mentioned. i'm (yet?) no java-developper. so, further questions to the community ;-)) Am 01.09.2010 15:20, schrieb jitendra rajput: > Hi, > > I have gone through the tutorial about writing plugin i...

  1. <<
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
  11. 10
  12. >>

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Logout

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.