• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
      • LucidWorks Big Data
    • Apache Releases
      • Apache Solr 4.0-dev
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Lucid University
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Lucene Revolution
      • Tradeshows & Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Board of Directors
    • Apache Lucene/Solr Committers
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Log in
Home . DevZone . Forum

Lucid Imagination Forum » LucidWorks Enterprise

Configuring MIME type restriction

(3 posts) (2 voices)
  • Started 3 months ago by senthilkumar.arumugam
  • Latest reply from senthilkumar.arumugam

Tags:

  • crawler
  1. senthilkumar.arumugam
    Member

    Hi,

    IS there a way to configure and control the MIME types as part of crawling in the current version of LWE? We are able to filter the specific MIME types at the time of querying. We would like to avoid crawling and indexing specific MIME types that are not our interest.  Can someone please let us know if there is a way to inform the crawler not to crawl specific MIME types for eg zip, .rar .

    Posted 3 months ago #
  2. Lance
    Professional Services Engineer

    In the Data Source configuration page, you will see "Include paths" and "Exclude paths". You can use regular expressions to select or remove MIME types. To remove .zip files, add this to "Exclude paths":

    http://.*\.zip

    This will select every http link which is a zip file. About regular expressions:

    http://lucidworks.lucidimagination.com/display/help/Using+Regular+Expressions

    Posted 3 months ago #
  3. senthilkumar.arumugam
    Member

    This helped and worked. Thank you for the quick reply.

    Posted 3 months ago #

RSS feed for this topic

Reply

You must log in to post.

  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Website Feedback
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Sitemap
  • Admin

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2012 Lucid Imagination. All Right reserved.