• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
    • Certified Distributions
      • Certified Solr
      • Certified Lucene
    • Apache Releases
      • Apache Solr
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Training
    • Solr/Lucene Certification
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Technical Leadership
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Sign Up or Log In
Home . Blog

Blog

Indexing rich files into Solr, quickly and easily

By Erik HatcherAugust 31, 2011

This past weekend I presented yet another “Rapid Prototyping with Solr” presentation, this time back in the saddle with the No Fluff, Just Stuff symposium in Raleigh, NC. I intentionally waited until the last minute to hack together a quick script to index some data I haven’t indexed before to demonstrate the ease at which one can grab Solr and immediately make some use out of it. This time around I cobbled together a …

Read more

The Apache Lucene Ecosystem: My View of 2010

By Grant IngersollDecember 27, 2010

After a week off to enjoy time with my family, I thought I would kick off the last week of 2010 with a look back at the year as it relates to the Apache Lucene ecosystem.  For anyone who follows the amalgamation of projects that I like to call the Lucene Ecosystem (the Apache projects: Lucene, Solr, Nutch, Mahout, Tika, PyLucene, Lucy, Lucene.NET, Droids, ManifoldCF — Lucene Connector Framework, OpenNLP and UIMA) you know it …

Read more

Extending Apache Tika Capabilities

By Sami SirenJune 18, 2010

Apache Tika is a toolkit for extracting metadata and textual content from various document formats. Tika itself provides implementation for parsing some document formats while it relies on external libraries (such as Apache PDFBox and Apache POI) for parsing many more.

Tika provides a uniform Java API for all of the supported document formats to make life easier for the user.  Additionally, Tika provides functionality for detecting document type and content language.

In my earlier …

Read more

Berlin Buzzwords Recap

By Grant IngersollJune 11, 2010

Back from Berlin Buzzwords and finally over the jet lag, so I thought I would put up some feedback.  First off, it was a well organized conference with a nice focus on searching, storage and scaling.  Kudos to Isabel, Simon and Jan for all their hard work.  It also had great wi-fi coverage, which is always a struggle at every conference I’ve ever been too.

As for the talks, I gave the Keynote on using …

Read more

Apache Lucene EuroCon Agenda – The Revolution is On!

By Grant IngersollApril 22, 2010

After reviewing a lot of great talk proposals, we’ve announced the agenda for Apache Lucene Eurocon: Apache Lucene EuroCon – Europe’s Premier Lucene and Solr Search User Conference.

One of the things I really like about this agenda is it is a great mix of basics, use cases from all over the search map (CMS, news, social media, advertising), business decisions (see last list and next list) and advanced topics (NLP, collab filtering, machine …

Read more

News Flash: Apache Lucene gives birth to triplets!

By Grant IngersollApril 21, 2010

Apache Lucene (the Lucene top level project, not Lucene the Java search API.  I know,  it’s confusing sometimes) has once again proved to be a fertile area for innovation (having already given birth to Apache Hadoop a few years back), as it once again has given birth, this time to three new Apache Top Level Projects (just approved by the Board at Apache): Apache Mahout, Apache Nutch and Apache Tika (never mind the URLs, …

Read more

Apache Lucene Connector Framework now in Incubation at the ASF

By Grant IngersollJanuary 20, 2010

Short Version

The Apache Lucene Connector Framework project has officially entered incubation.  LCF, for short, is going to be a framework for connecting to content repositories like Sharepoint, Documentum, etc. and will make it easy to hook into Lucene, Solr, Nutch, Mahout, Tika, while, of course, remaining agnostic of the final destination of the data.  See the Connectors website and the original proposal for more info.  Help wanted!

Long Version

Background

A while back, MetaCarta…

Read more

The Apache Lucene Ecosystem: My view of 2009

By Grant IngersollDecember 24, 2009

It’s that time of year, so I thought I would take a look back at the year that was for the Lucene Ecosystem and maybe look ahead just a little bit too.

First and foremost, it should be obvious to even the most casual observer that the Apache Lucene communities are thriving.  Not only is it a great time to be involved in open source, it’s a great time to be involved in Lucene.  Both …

Read more

SF Bay Area Meetup Slides Available

By Grant IngersollJune 5, 2009

Slides from the first Lucene/Solr SF Bay Area meetup are now available here.

Thanks to everyone who participated.…

Read more

ApacheCon Europe Follow Up

By Grant IngersollApril 1, 2009

Another year, another successful ApacheCon Europe, at least as far as Lucene, Solr and I are concerned.  This year, like last, Erik Hatcher and I had trainings on Lucene and Solr.  Both were well attended, despite the economy, showing once again the power of open source and the fact that people are still invested in search.  (If you missed the training, see here for alternatives.)

During the conference, there were several talks on Lucene, …

Read more

  • Recent Posts

    • Lucene Revolution 2012 – Call for Participation now open!
    • SolrCloud is Coming (and looking to mix in even more ‘NoSQL’)
    • Our Solr Reference Guide updated for v3.5
    • Enhancing Discovery with Solr and Mahout – session slides now available!
    • Solr and LucidWorks feature matrix available
    • LucidWorks Enterprise latest version 2.0.1 released!
    • Why Not AND, OR, And NOT?
    • Options to tune document’s relevance in Solr
    • Dallas JavaMUG December 14th 2011
    • Apache Mahout user meeting – session slides and videos are now available!
  • Archives

    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
  • Tags

    acts_as_solr apache Apache Mahout best practices chump code4lib dismax drupal enterprise search Erik Hatcher field collapsing function query Grant Ingersoll hoss image isfdb local params Lucene lucene revolution LucidGaze lucid imagination Mahout Marc Krellenstein Mark Miller nested queries nutch Open Source Open Source Search qparser query parser queryparser Rails release result grouping Richmond Ruby schema design sint Solr solr 3.1 solr 4.0 solr cloud sortable Tika VA
  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Log in

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2011 Lucid Imagination. All Right reserved.