• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
      • LucidWorks Big Data
    • Apache Releases
      • Apache Solr 4.0-dev
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Lucid University
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Lucene Revolution
      • Tradeshows & Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Board of Directors
    • Apache Lucene/Solr Committers
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Log in
Home . DevZone . Forum

Lucid Imagination Forum » LucidWorks Enterprise

De duplication does not work with body field

(2 posts) (2 voices)
  • Started 4 months ago by senthilkumar.arumugam
  • Latest reply from alexey.serba

Tags:

No tags yet.

  1. senthilkumar.arumugam
    Member

    Hi,

    We have few pages that are represented by muliple URLs, they are getting indexed more than once in LWE. When we search for specific content from that page(doc), same content is displaying multiple times with different URLS. We tried using de-duplication feature to remove these duplicated pages. We created a new field type and assigned <body> tag to the new field and selected the new field type for de-duplication as overwrite. But this did not help removing the content.

     

    Is it possible to use the custom defined fields for de-duplication?  Please suggest..

     

    Thanks...

    Posted 4 months ago #
  2. alexey.serba
    Moderator

    Are you sure that your pages are exact duplicates? LWE de-duplication feature supports exact duplicate detection by default (although it is possible to switch over to fuzzy algorithm by changing related portion of Solr configuration). If you have some problems troubleshooting this feature I would recommend to 1) change signatureField to be stored 2) change de-duplication type to "Tag" option 3) re-index your content and 4) verify that signature hashes are the same for these documents.

    Posted 4 months ago #

RSS feed for this topic

Reply

You must log in to post.

  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Website Feedback
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Sitemap
  • Admin

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2012 Lucid Imagination. All Right reserved.