I’ve had quite a few people asking me about the state of geospatial support in Apache Solr lately, so I thought I would give a brief update here.
Much of the functionality behind SOLR-773 is now implemented in the trunk version of Solr and is available for check out. This includes support for several different distance measures (Euclidean, Haversine, etc.) as well as support for sorting by functions (aka sorting by distance). Due note there are some minor issues left to fix on that one. See SOLR-1297 for the gotchas there. There is also support for several different point based field types now too. See SOLR-1131 for more info.
Right now, I’m working on SOLR-1568, which will add the last “major” piece of needed functionality: spatial filtering based on FieldType. I’m getting close to putting up a patch for review, but it will then take a week or two more from there to iterate and commit.
Beyond that, there are some minor things that would be nice to have, but not showstoppers, I don’t think, for the basic spatial use cases (sort, boost, filter by distance.) For those wanting deeper capabilities along the lines of shape intersections, that’s a bit farther off, unless of course, you have a patch!
As always, feedback welcome!
Related link: On-Demand webinar
From Here to There, You Can Find it Anywhere: Building Local/Geo-Search with Apache Lucene and Solr


Grant — somewhat related to the above, we are giving some consideration to initiating an effort which would marry together the capabilities of Solr with GeoServer (geoserver.org), the latter being a pure Java open source geospatial server. The preliminary idea would be to build an open source component which would interoperate with Solr and GeoServer. The proposed component would parse out the search criteria (dividing them into criteria for Solr and criteria for GeoServer) send off the requests to Solr and GeoServer servers respectively and then integrate the two result sets and send the net result back to the calling application.
What I’m wondering is whether such an effort would be complementary to what you’re doing with Solr as described above? Obviously, I’m looking to stay on the right side of Solr development effort and would not want to charge off in a direction which did not complement your efforts in some way. The perceived benefit of the Solr/GeoServer idea is that you have the full geospatial capabilities from the get go, as opposed to having to build them over time.
Any thoughts or comments you have in this regard are most welcome. Thanks and best regards…
Terence Gannon
Intellog Inc.
July 23, 2010 09:35 — Terence Gannon
Hi Terence,
I’m not an expert in GeoServer. I’m a little unsure, however, of what you mean by “full geospatial capabilities from the get go”. What proposed features does GeoServer have that would help solve spatial search problems? From my naive view, it seems that geoserver mainly deals with storing and managing map data and then displaying it. Can it do things like shape intersections, bounding box calculations, geocoding, etc.?
Also, as far as adding any specific integration code into the Solr core, the GPL license is a show stopper. Naturally, that doesn’t prevent it from being hosted elsewhere, it just means we can’t commit any code that has a dependency on GeoServer to the Solr code base.
HTH,
Grant
Cheers,
Grant
August 13, 2010 07:35 — Grant Ingersoll
All,
Geoserver is an application that allows you to publish geospatial data in OGC-compliant formats like WMS, WCS, WFS, KML, etc…The data “sources” can be flat file data sets like GML, Shapefiles, KML or external relational databases like Oracle Spatial and Postgresql/PostGIS. There is also support for raster data which can be stored in the index but not in it’s native form.
Geoserver at the core, uses several other Open Source libraries like GEOS/JTS, PROJ4 and GDAL to read and translate the data. I believe that implementing these technologies in to Solr core would be more beneficial than trying to make the index work with Geoserver. After actually writing that, I question what the best approach is for this endeavor.
1. Integrating Solr in to Geoserver would provide some really great advanced search capabilities to the application. but…
2. Integrating some of the aforementioned libraries in to Solr would essentially make Solr an indexer for geospatial data which could potentially surpass the performance of relational databases.
Geoserver does not necessarily allow you to perform all the predicate geometry functions like buffering and intersecting of features. Again, you would have to rely on something like the JTS to perform these operations on any geometries stored in the index.
Given that Solr is so easy to extend, it would be relatively painless to return geometries stored in the index as various other file formats (OGC compliant or not). Check out the GeoTools Suite (http://www.osgeo.org/geotools) from the same company who developed Geoserver. They wrap a lot of the underlying functionality that Solr would need in to a nice single package.
There are many many directions to go from here and I am *very* interested to see more geospatial support in these awesome Apache projects. Please let me know what I can do to help
Kindest Regards,
Adam
August 15, 2010 13:25 — Adam Estrada
Perhaps I should back up a step or two and provide some context for my initial question to Grant. The basic requirement which I want to address is search which permits both content and geospatial criteria to be specified and have results returned that meet both of those criteria.
The origin of this requirement is the oil & gas industry, and specifically data related to hydraulic fracturing. These are expensive treatments (millions of dollar per well in some cases) so it’s desirable to increase the odds of success by looking at relevant treatment information from past jobs. Something along the lines of “show me all treatment information where chemical x and process y were used, but only as it relates to the Montney formation within 10 kilometres of the well in question.” In this example, chemical x and process y are content-based criteria and “Montney” and “within 10 kilometres” are geospatial criteria.
The theory would be to develop a component which would parse out the two kinds of criteria and send them off in their respective subsystems for handling. Solr for content-based criteria and whatever-the-geospatical-engine-would-be for the geospatial criteria. The two systems would process the queries and return results to the newly-developed component, which would take responsibility for integrating the two sets of results and presenting them back to the user. Ideally, the new component would do all of this without touching the code based of either of the subsystems it calls, so there are no licensing entanglements.
Let me know if that helps, or if there is more information you require. The main objective is to only develop what’s absolutely required, and not re-invent any wheel which already exists. Thanks for your help! Cheers…TCG
August 16, 2010 12:43 — Terence Gannon
Hi Terence,
RE the example on hydraulic fracturing, this is already something that Solr supports in the current trunk. Some things Solr still needs in the way of spatial are:
1. Geocoding query parser and indexing components
2. Shape intersections, etc.
August 17, 2010 03:18 — Grant Ingersoll
Do any more detailed requirements documents exist for these two items? Perhaps what we should be thinking of contributing new functionality to Solr, rather than building a new, standalone component. Thoughts?
August 17, 2010 06:04 — Terence Gannon
Hi Terence,
http://wiki.apache.org/solr/SpatialSearch contains most of the detail about Spatial. Also, there are specific JIRA issues cited above (start with SOLR-773). Also, see http://www.lucidimagination.com/blog/2010/07/20/update-spatial-search-in-apache-lucene-and-solr/
I don’t have much at this point on shape intersections other than it would be nice to be able to build up more complex bounding boxes, etc. for filtering based on shapes and intersections with multiple shapes.
As for geocoding, I think we need tools that can translate addresses, etc. into lat/lon, etc. Geonames is a possible starting place. I imagine Open Streets has some capability too. I haven’t looked too much into it yet.
August 19, 2010 06:18 — Grant Ingersoll
That’s good info, Grant, thank you.
August 19, 2010 06:40 — Terence Gannon