RTP Semantic Web Slides are available

Here are my slides from the talk I gave last night at the RTP Semantic Web Group:

Read more...

Berlin Buzzwords Recap

Back from Berlin Buzzwords and finally over the jet lag, so I thought I would put up some feedback.  First off, it was a well organized conference with a nice focus on searching, storage and scaling.  Kudos to Isabel, Simon and Jan for all their hard work.  It also had great wi-fi coverage, which is always a struggle at every conference I’ve ever been too.

As for the talks, I gave the Keynote on…

Read more...

Apache Lucene EuroCon Agenda – The Revolution is On!

After reviewing a lot of great talk proposals, we’ve announced the agenda for Apache Lucene Eurocon: Apache Lucene EuroCon – Europe’s Premier Lucene and Solr Search User Conference.

One of the things I really like about this agenda is it is a great mix of basics, use cases from all over the search map (CMS, news, social media, advertising), business decisions (see last list and next list) and advanced topics (NLP, collab filtering, machine…

Read more...

News Flash: Apache Lucene gives birth to triplets!

Apache Lucene (the Lucene top level project, not Lucene the Java search API.  I know,  it’s confusing sometimes) has once again proved to be a fertile area for innovation (having already given birth to Apache Hadoop a few years back), as it once again has given birth, this time to three new Apache Top Level Projects (just approved by the Board at Apache): Apache Mahout, Apache Nutch and Apache Tika

Read more...

Apache Mahout 0.3 Released

Here’s the announcement:

Apache Mahout <http://lucene.apache.org/mahout> 0.3 has been released and is
now available for public
download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Up-to-date maven artifacts can be found in the Apache repository at

https://repository.apache.org/content/repositories/releases/org/apache/mahout/

Apache Mahout is a subproject of Apache Lucene with the goal of
delivering scalable machine learning algorithm implementations under
the Apache license. http://www.apache.org/licenses/LICENSE-2.0

Mahout is a machine learning library meant to scale: Scale in terms of
community to support

Read more...

Integrating Apache Mahout with Apache Lucene and Solr – Part I (of 3)

Introduction

As Apache Mahout is about to release its next version (0.3), I thought I would share some thoughts on how it might be integrated with Apache Lucene and Apache Solr.  For those who aren’t aware of Mahout, it is an ASF project building out a library of machine learning algorithms that are designed to be scalable (often via Apache Hadoop) and licensed under the Apache Software License (i.e., commercially friendly). …

Read more...

Intro to Mahout at Triangle Java User Group on Feb. 15

Monday, 15 February 2010
18:00 to 21:00

I will be giving an introduction to Apache Mahout at the Triangle Java User Group on Feb. 15.  See http://trijug.org/ for more details.  Hope to see you there!

Read more...

Apache Lucene Connector Framework now in Incubation at the ASF

Short Version

The Apache Lucene Connector Framework project has officially entered incubation.  LCF, for short, is going to be a framework for connecting to content repositories like Sharepoint, Documentum, etc. and will make it easy to hook into Lucene, Solr, Nutch, Mahout, Tika, while, of course, remaining agnostic of the final destination of the data.  See the Connectors website and the original proposal for more info.  Help wanted!

Long Version

Background

A while…

Read more...

The Apache Lucene Ecosystem: My view of 2009

It’s that time of year, so I thought I would take a look back at the year that was for the Lucene Ecosystem and maybe look ahead just a little bit too.

First and foremost, it should be obvious to even the most casual observer that the Apache Lucene communities are thriving.  Not only is it a great time to be involved in open source, it’s a great time to be involved in Lucene. …

Read more...

Apache Mahout 0.2 Released

I just sent out the Apache Mahout 0.2 release announcement.  Here’s a copy:

Apache Mahout 0.2 has been released and is now available for public
download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Apache Mahout is a subproject of Apache Lucene with the goal
of delivering scalable machine learning algorithm implementations
under the Apache license. http://www.apache.org/licenses/LICENSE-2.0
Scale in terms of computation to the
size of data you manage today.  Scale in terms of community to

Read more...