Proximity Searching with GeoHash
Software
Monday, 02 August 2010 08:12

Need to compare Geo Location coordinates? Then you just might love GeoHash as much as I do. Many GeoLocation hashing algorithms facilitate scalable proximity searching. GeoHash is such an algorithm and is in the public domain. There are also many implementations of Geohash available. I am using davetroy's JavaScript implementation of GeoHash. The README of davetroy's project explains the algorithm quite well:

The Geohash algorithm was first described by Gustavo Niemeyer in February 2008. By interleaving latitude and longitude information in a bitwise fashion, a composite value is generated that provides a high resolution geographic point, and is well suited for storage or transmission as a character string.

Geohash also has the property that as the number of digits decreases (from the right), accuracy degrades. This property can be used to do bounding box searches, as points near to one another will share similar Geohash prefixes.

If you need help visualizing this, checkout the geohash demonstrator.

davetroy's JavaScript implementation of GeoHash is dead simple. There are only two public methods you need to know, as follows.

string encodeGeoHash(latitude, longitude)
object decodeGeoHash(geohash)

Using GeoHash for Proximity Searching

Let's look at how we might use GeoHash for simple, scalable proximity searching. Suppose we have the following locations and our current location as follows:

var locations = [
  { latitude: 47.8559705, longitude: -181.9415665 },
  { latitude: 47.8558705, longitude: -181.9416665 },
  { latitude: 47.8557705, longitude: -181.9417665 },
  { latitude: 47.8556705, longitude: -181.9418665 },
  { latitude: 47.8555705, longitude: -181.9419665 }
];
 
var currentLocation = { latitude: 47.8557715, longitude: -181.9417665 };

Now, let's efficiently find the two closest locations to our current location. First, I would start by computing the GeoHash for each location.

currentLocation.geoHash = encodeGeoHash(currentLocation.latitude, currentLocation.longitude);
for (var i = 0; i < locations.length; i++) {
  locations[i].geoHash = encodeGeoHash(locations[i].latitude, locations[i].longitude);
}

Now, we can find the nearest locations with the following function.

function getNearest(currentLocation, locations, maxNeighbors) {
  var matching = {},
      accuracy = 12,
      matchCount = 0;
  while (matchCount < maxNeighbors &amp;&amp; accuracy > 0) {
    var cmpHash = currentLocation.geoHash.substring(0,accuracy);
    for (var i = 0; i < locations.length; i++) {
      if (locations[i].geoHash in matching) continue; //don't re-check ones that have already matched
      if (locations[i].geoHash.substring(0,accuracy) === cmpHash) {
        matching[locations[i].geoHash] = locations[i];
        matchCount++;
        if (matchCount === maxNeighbors) break;
      }
    }
    accuracy--;
  }
  var tmp = [];
  for (var geoHash in matching) {
    tmp.push(matching[geoHash]);
  }
  return tmp;
}

This function takes three parameters. The first two are pretty self explanatory. The last, maxNeighbors, is how many matches we want to find. We start by checking the full geoHash for matches. We then truncate the geoHashes by one more character each iteration of the while loop, thereby looking for more broad matches each time. When we have enough matches, we break from the loop. We store results in a map with geoHash as the key so that we can keep track of which locations have already matched so that we can avoid duplicates. We convert this to an array before returning the results.

This code solved my needs. What is absolutely wonderful about this is not only that it is pretty simple, but that it is scalable. If you're into Hadoop, this logic would by very easy to use with MapReduce. However you want to set up your architecture, this algorithm sets you up well for scalability.

So, now that we've got our logic all worked out, let's find the nearest two locations.

var nearest = getNearest(currentLocation, locations, 2);

And the contents of nearest will be as follows:

[
  { geoHash: "b0800pbh040n", latitude: 47.8557705, longitude: -181.9417665 },
  { geoHash: "b0800pbh8hb5", latitude: 47.8558705, longitude: -181.9416665 }
]

One final note. If you are trying to use this in node.js (like me), you can install this using npm (npm install geohash@latest). Have fun!

Limitations

It should be noted that there are limitations to using geohash for proximity searches. From wikipedia:

One limitation of the Geohash algorithm is in attempting to utilize it to find points in proximity to each other based on a common prefix. Edge case locations close to each other but on opposite sides of the Equator or a meridian can result in Geohash codes with no common prefix[1].

Secondly a Geohash essentially defines a bounding box within which a location lies, therefore two locations may be spatially very close but have different geohashes. In order to be useful to proximity searches, the surrounding eight geohashes of a geohash must be calculated and the locations matching these pulled out, therefore complicating potential usage in proximity searches.

In the code above, I am not checking the eight surrounding geohashes for proximity searching, but instead relying only on size of the matching prefix. This approach is less accurate, but much more simple.

See Also


blog comments powered by Disqus