Lucid Imagination

Secondary links

  • Contact Us
  • Log out
  • Downloads
  • Solutions
    • Partners |
    • Blog |
    • Software |
    • Services |
    • Training |
    • Case Studies |
    • Webinars |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Docs |
    • Downloads |
    • Whitepapers |
    • Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Options

  • results per page

Clear all facets

  • Project clear projects

  • Source clear sources

  • Author clear authors

Search Results for

Results loading...

Found 36,204 results in 0.143 seconds. Displaying page 1 of 3,621, sorted by

  1. [nutch-user] Re: depth information not being available in crawl datum

    Sent 2010-09-02 by Julien Nioche <lists.digitalpebble@...>

    Hi, You could track the depth of a URL from the seeds by implementing a custom ScoringFilter. ScoringFilters are called at various points of the workflow, including when outlinks have been found for a page. The logic would be to simply increment the depth of the current page and generate a metad...

  2. [nutch-user] Re: performance for small cluster

    Sent 2010-09-02 by AJ Chen <ajchen@...>

    The other option for reducing time in fetching the last 1% urls may be using a smaller queue size, I think. In Fetcher class, the queue size is magically determined as threadCount * 50. feeder = new QueueFeeder(input, fetchQueues, threadCount * 50); Is there any good reason for factor 50? If...

  3. [nutch-user] Re: performance for small cluster

    Sent 2010-09-02 by AJ Chen <ajchen@...>

    Thanks Ken for the tips. -aj On Wed, Aug 18, 2010 at 9:17 AM, Ken Krugler wrote: > Hi AJ, > > > On Aug 18, 2010, at 7:26am, AJ Chen wrote: > > Thanks for the explanation. I'm using hdfs. what config parameters may >> help >> speed up shuffling, merging, sorting a...

  4. [nutch-user] Fwd: Selective Fetching and Notifying When Files Have Been Modifed Since Last Fetch

    Sent 2010-09-02 by Sonal Goyal <sonalgoyal4@...>

    Thanks and Regards, Sonal www.meghsoft.com http://in.linkedin.com/in/sonalgoyal ---------- Forwarded message ---------- From: Sonal Goyal Date: Thu, Sep 2, 2010 at 10:33 PM Subject: Re: Selective Fetching and Notifying When Files Have Been Modifed Since Last Fetch To: us...

  5. [nutch-user] Re: Not getting all documents

    Sent 2010-09-02 by Gingras Jean-François <Jean-Francois.Gingras@...>

    Hi, You may want to look for the db.max.outlinks.per.page property in your nutch-[default|site].xml configuration file. The default is 100 outlinks in nutch 1.0. So, if your a index page contains more than 100 link to PDF file, then only a maximum of 100 will be process for each index page. Als...

  6. [nutch-user] Custom HTTP status handling for throttling

    Sent 2010-09-02 by Nayanish Hinge <nayanish.hinge@...>

    Hi, Some website return HTTP 503 when they throttle hits. I see that I need to re-implement the HttpBase.java to handle this as a special case and put a retry logic (with some exponential back-off). But in order to get HttpBase used by protocol-http and protocol-httpclient, we need to override th...

  7. [nutch-user] Trying to applu timeout.patch on 1.1 source

    Sent 2010-09-02 by "Nemani, Raj" <Raj.Nemani@...>

    As part the following problem (I have posted this already and would appreciate any help), I am trying to apply timeout.patch using patch.exe (from Unix Utils) on Windows 7 64 bit. Both patch.exe and timeout.patch files are in the top level folder of the 1.1 source files (i.e the top level folder ...

  8. [nutch-dev] Re: Nutch 2.0 Help

    Sent 2010-09-02 by Julien Nioche <lists.digitalpebble@...>

    Hi David, I haven't used the Hbase backend with GORA for quite some time but from what I can remember you'll need the following things : * conf/hbase-site.xml => this should correspond to your local configuration * conf/gora-hbase-mapping.xml => see below * conf/gora.properties => don't think t...

  9. [nutch-dev] Nutch 2.0 Help

    Sent 2010-09-02 by David Stuart <david.stuart@...>

    Hey All, I have setup the latest version nutch from trunk and am running into a few issues with hbase and injecting urls. when I run the command runtime/local/bin/nutch inject runtime/local/seed/ I get InjectorJob: java.lang.RuntimeException: Could not create datastore at org.apache....

  10. [nutch-user] Re: Nutch redirects.

    Sent 2010-09-02 by Andrzej Bialecki <ab@...>

    On 2010-09-02 02:45, Mark Stephenson wrote: > Hi, > > I am new to Nutch and I'm trying to understand how it handles redirects. > Let's say I want to fetch the following article from the New York Times: > > http://www.nytimes.com/2010/08/30/opinion/30mon1.html > > That is the only URL I put in my ...

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. >>

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Logout

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.