• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
      • LucidWorks Big Data
    • Apache Releases
      • Apache Solr 4.0-dev
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Lucid University
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Lucene Revolution
      • Tradeshows & Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Board of Directors
    • Apache Lucene/Solr Committers
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Log in
Home . DevZone . Forum

Lucid Imagination Forum » LucidWorks Enterprise

LucidWorks Built-in Crawler does not index extra HTML meta tags.

(5 posts) (4 voices)
  • Started 4 months ago by pascal.essiembre
  • Latest reply from golden_1

Tags:

  • aperture
  • crawler
  • HTML
  • meta
  • tag
  • web
  1. pascal.essiembre
    Member

    Hello,

    I have an issue where I am indexing web sites, and not all the HTML meta-data fields are being picked up.  
    All meta data fields not already defined in the Solr schema are not being picked up by LucidWorks web crawler (the <meta name=”blah” content=”blah”> fields).    Even if I manually define them to Solr Fields or Data Source field mapping in LucidWorks, it does not make a difference.    I even tried to manually add them as fmap fields in the Solr Cell section of solrconfig.xml.   I also tried to set them in defaults.yml (Tika) with no success.  Every time I restarted LWE-Core and wiped out the data source before re-creating it just to be safe (to make sure I start fresh).  I searched thoroughly through your documentation, this forum, as well as elsewhere online, with no luck. 
    Can you please tell me how I can index custom HTML meta-data fields with your out-of-the-box crawler?    We are using LucidWorks Enterprise 2.0.  Your guidance would be much appreciated.
    Thank you!

    Posted 4 months ago #
  2. Andrzej Bialecki
    Moderator

    The built-in Web crawler, based on Aperture, doesn't support this. The only meta-tags it collects are: author, description and keywords.

    You can use some other external crawler and integrate it with LucidWorks using an "external" data source type.

    Posted 4 months ago #
  3. golden_1
    Member

    Can Aperture be modified to give the ability to configure to get the value of any meta tags?  We are trying to see if LucidWorks Enterprise can replace out Google Search Appliances and this would be a requirement.

    Posted 2 months ago #
  4. Lance
    Professional Services Engineer

    This is supported in the next release of LucidWorks. LucidWorks 2.1 will be released within the next 2 weeks.

    All <META name="name" content="content" /> tags become fields named "attr_meta_name". 

    Posted 2 months ago #
  5. golden_1
    Member

    Fantastic, that is great news.

    Posted 2 months ago #

RSS feed for this topic

Reply

You must log in to post.

  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Website Feedback
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Sitemap
  • Admin

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2012 Lucid Imagination. All Right reserved.