August 31, 2010
Introduction
Recently, I did some minor work on improving the usability of the Lucene spell checker (see LUCENE-2479, LUCENE-2608 and the associated Solr work) and it got me thinking that a post on spell checking in Solr would be useful.
For those who aren’t familiar, the notion of spell checking in search (often called Did You Mean?) is slightly different from the notion of simply correcting spelling errors. It’s not that we don’t…
Read more...
August 19, 2010
Do you remember this scenario from days of yore?
- Company A buys a software license from Company B, a startup.
- Company A crosses its fingers that Company B doesn’t go bankrupt and disappear, along with the source code for Company A’s mission-critical software.
- Company B goes kaput.
- Company A is left with some machine-readable binary code that it is powerless to develop or use.
Source code escrow has changed the outcome of this…
Read more...
July 29, 2010
If you missed the SF Bay Area Lucene meetup last night, I thought I would give a recap of some of the highlights. First off, thanks to salesforce.com for the use of their space on the 42nd floor of 1 Market St. in downtown S.F. The views of the bay and the city were especially stunning at night with what appeared to be a full moo
n rising over the Bay Bridge. Salesforce…
Read more...
July 15, 2010
As some of you may know, I blog regularly on Network World’s Open Source Subnet. Watch weekly for more of my musings on trends, news and any number of topics that catch my interest. In my most recent post, I ask readers for their take on the legal maze associated with open source. In my opinion, Apache is the most liberal open source package today, the one most true to form. Everybody can use it,…
Read more...
July 9, 2010
Here are my slides from the talk I gave last night at the RTP Semantic Web Group:
Read more...
June 11, 2010
Back from Berlin Buzzwords and finally over the jet lag, so I thought I would put up some feedback. First off, it was a well organized conference with a nice focus on searching, storage and scaling. Kudos to Isabel, Simon and Jan for all their hard work. It also had great wi-fi coverage, which is always a struggle at every conference I’ve ever been too.
As for the talks, I gave the Keynote on…
Read more...
April 30, 2010
The other day, Michael Coté asked me where Apache Lucene and Solr fit in with the NoSQL movement (having heard about the Guardian’s use of Solr), to which I replied: I haven’t used SQL in any significant way since I started using Lucene in 2004 (and I started my career doing Oracle DBA work, etc. way back when.) We just didn’t have a fun name for it “back in the day”.
All kidding…
Read more...
April 22, 2010
After reviewing a lot of great talk proposals, we’ve announced the agenda for Apache Lucene Eurocon: Apache Lucene EuroCon – Europe’s Premier Lucene and Solr Search User Conference.
One of the things I really like about this agenda is it is a great mix of basics, use cases from all over the search map (CMS, news, social media, advertising), business decisions (see last list and next list) and advanced topics (NLP, collab filtering, machine…
Read more...
April 21, 2010
Apache Lucene (the Lucene top level project, not Lucene the Java search API. I know, it’s confusing sometimes) has once again proved to be a fertile area for innovation (having already given birth to Apache Hadoop a few years back), as it once again has given birth, this time to three new Apache Top Level Projects (just approved by the Board at Apache): Apache Mahout, Apache Nutch and Apache Tika…
Read more...
March 18, 2010
Here’s the announcement:
Apache Mahout <http://lucene.apache.org/mahout> 0.3 has been released and is
now available for public
download at http://www.apache.org/dyn/closer.cgi/lucene/mahout
Up-to-date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/mahout/
Apache Mahout is a subproject of Apache Lucene with the goal of
delivering scalable machine learning algorithm implementations under
the Apache license. http://www.apache.org/licenses/LICENSE-2.0
Mahout is a machine learning library meant to scale: Scale in terms of
community to support
…
Read more...