• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
      • LucidWorks Big Data
    • Apache Releases
      • Apache Solr 4.0-dev
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Lucid University
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Lucene Revolution
      • Tradeshows & Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Board of Directors
    • Apache Lucene/Solr Committers
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Log in
Home . DevZone . Forum

Lucid Imagination Forum » LucidWorks Enterprise

crawl delay and num threads are not working in defaults.yml

(9 posts) (4 voices)
  • Started 5 months ago by senthilkumar.arumugam
  • Latest reply from Andrzej Bialecki

Tags:

No tags yet.

  1. senthilkumar.arumugam
    Member

    Hi,

    We have recently started exploring lucidworks enterprise search engine to replace our existing search engine. I am exploring the configuration settings to limit the crawl rate for individual collection level. EVen global defaults.yml changes alkso seems to be not working. I modified http.crawl.delay and http.num.threads in defaults.yml. I tested with 1000 and 10000 for crawl.delay. I do not see any difference in crawl speed. Can anyone please help me to identify the parameters to control the crawler rate?

    Thanks in advance.

     

     

    Posted 5 months ago #
  2. senthilkumar.arumugam
    Member

    To be more precise on this clarification, I am actually looking for controlling the crawling speed on individual collection level.. Please let me know if this question is not suitable in this forum...

    Posted 5 months ago #
  3. senthilkumar.arumugam
    Member

    Hi,

    Can anyone help me out here?

    Posted 5 months ago #
  4. Mark Miller
    Moderator

    Hi senthilkumer.arumugam - did you restart LWE after changing these settings? You are using LWE 2.0?

    Posted 5 months ago #
  5. srdshanmugavel
    Member

    Yes Mark. We have restarted LWE whenever we have updated the defaults.yml/collections.yml

    We are using LWE 2.0.

    Posted 5 months ago #
  6. senthilkumar.arumugam
    Member

    Hi Mark,

    Do you think anything else could be wrong?   Actually I modifed the parameters and crawled different sites for an hour. The number of crawled documents for that hour was almost same.  If you think any other means of testing this out, that would be very helpful.

     

     

    Posted 4 months ago #
  7. Andrzej Bialecki
    Moderator

    Hi,

    These options indeed appear in defaults.yml in LWE 2.0 but they are inactive. Sorry for the confusion... This means that in LWE 2.0 the rate limiting of a crawl is not supported.

    Posted 4 months ago #
  8. senthilkumar.arumugam
    Member

    Hi,

    Thank you for the update.  Any rough estimate on when will it be supported?

    Posted 4 months ago #
  9. Andrzej Bialecki
    Moderator

    We will look into supporting crawl-delay in upcoming releases, but I can't give you any estimate about when... Multi-threaded Web crawling won't be supported anytime soon - we would have to replace the Aperture-based crawler with something else. If you really need high-volume high-speed crawling I'd suggest using Nutch - LucidWorks integrates nicely with Nutch.

    Posted 4 months ago #

RSS feed for this topic

Reply

You must log in to post.

  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Website Feedback
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Sitemap
  • Admin

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2012 Lucid Imagination. All Right reserved.