You know your (technical) baby is (almost) grown up when the book on the project finally comes out. Such is the case for Apache Mahout, thanks to Manning Publications shipping Mahout in Action this week.
So, before I start into my review, let me first say congratulations to Sean, Robin, Ted, Ellen and Manning for producing such an excellent product. The simplest praise I can give it is to put it on the same …
Read more
Many times, clients ask us to help them estimate memory usage or disk space usage or to share benchmarks as they build out there search system. Doing so is always an interesting process, as I’ve always been wary of claims about benchmarks (for instance, one of the old tricks of performance benchmark hacking is to “cat XXX > /dev/null” to load everything into memory first, which isn’t what most people do when running their system) …
Read more
Introduction
During a past ecommerce webinar with Brian Doll of Sheetmusicplus.com, I posted a checklist of items that are commonly occurring in many ecommerce applications and then I waved my hands, due to time constraints, and said Solr (and now LucidWorks) can do almost all of them out of the box and left the rest as an exercise for the reader. (Note, the slides are available here. Registration required.) Well, now I …
Read more
Every now and then we get asked what the heck is a shingle in Lucene, as in the ShingleFilter or the ShingleMatrixFilter, so it seems worthwhile to provide some info on shingles in Lucene, Solr and LucidWorks Enterprise. First off, a shingle is just a word-based n-gram, as opposed to a character-based n-gram (NGramTokenizer, NGramTokenFilter, EdgeNGramTokenizer and EdgeNGramTokenFilter provide the latter functionality). We named it shingles just to differentiate the two when it comes …
Read more
A week and a day later, I’ve finally got a chance to put up my thoughts/notes on the first ever RTP Apache Lucene/Solr Meetup hosted by Lulu Press and co-sponsored by Lucid Imagination.
First off, hats off to Lulu for the excellent hosting, coordination and marketing of the event. You could definitely see the evidence of Lulu’s “Be Remarkable” philosophy in the event. I’d say we had roughly 30-40 people for the first time event, …
Read more
I was recently with a client doing a Best Practices assesment when I came across a common source of confusion related to sorting, faceting and schema design.
As background, Solr provides a schema that describes the Fields and Field Types (FT) that are used by an application. Field Types describe how Solr should handle the information contained in a Field. For instance, the integer FT tells Solr to treat the contents of any Field of …
Read more
| Monday, 23 March 2009 |
to |
Friday, 27 March 2009 |
Lucene and me at ApacheCon EU in Amsterdam March 23-27.
I’ve posted a Lucene related event schedule on my blog for people who are interested. Of particular note are the two days of pre-conference training on both Lucene and Solr. These are shorter ApacheCon versions of our 3 day training classes. Obviously, we can’t cover all the material that we do in our full …
Read more