| Proximity Searching with GeoHash |
| Software |
| Monday, 02 August 2010 08:12 |
|
Need to compare Geo Location coordinates? Then you just might love GeoHash as much as I do. Many GeoLocation hashing algorithms facilitate scalable proximity searching. GeoHash is such an algorithm and is in the public domain. There are also many implementations of Geohash available. I am using davetroy's JavaScript implementation of GeoHash. The README of davetroy's project explains the algorithm quite well:
If you need help visualizing this, checkout the geohash demonstrator. davetroy's JavaScript implementation of GeoHash is dead simple. There are only two public methods you need to know, as follows. string encodeGeoHash(latitude, longitude) object decodeGeoHash(geohash) Using GeoHash for Proximity SearchingLet's look at how we might use GeoHash for simple, scalable proximity searching. Suppose we have the following locations and our current location as follows: var locations = [ { latitude: 47.8559705, longitude: -181.9415665 }, { latitude: 47.8558705, longitude: -181.9416665 }, { latitude: 47.8557705, longitude: -181.9417665 }, { latitude: 47.8556705, longitude: -181.9418665 }, { latitude: 47.8555705, longitude: -181.9419665 } ]; var currentLocation = { latitude: 47.8557715, longitude: -181.9417665 }; Now, let's efficiently find the two closest locations to our current location. First, I would start by computing the GeoHash for each location. currentLocation.geoHash = encodeGeoHash(currentLocation.latitude, currentLocation.longitude); for (var i = 0; i < locations.length; i++) { locations[i].geoHash = encodeGeoHash(locations[i].latitude, locations[i].longitude); } Now, we can find the nearest locations with the following function. function getNearest(currentLocation, locations, maxNeighbors) { var matching = {}, accuracy = 12, matchCount = 0; while (matchCount < maxNeighbors && accuracy > 0) { var cmpHash = currentLocation.geoHash.substring(0,accuracy); for (var i = 0; i < locations.length; i++) { if (locations[i].geoHash in matching) continue; //don't re-check ones that have already matched if (locations[i].geoHash.substring(0,accuracy) === cmpHash) { matching[locations[i].geoHash] = locations[i]; matchCount++; if (matchCount === maxNeighbors) break; } } accuracy--; } var tmp = []; for (var geoHash in matching) { tmp.push(matching[geoHash]); } return tmp; } This function takes three parameters. The first two are pretty self explanatory. The last, maxNeighbors, is how many matches we want to find. We start by checking the full geoHash for matches. We then truncate the geoHashes by one more character each iteration of the while loop, thereby looking for more broad matches each time. When we have enough matches, we break from the loop. We store results in a map with geoHash as the key so that we can keep track of which locations have already matched so that we can avoid duplicates. We convert this to an array before returning the results. This code solved my needs. What is absolutely wonderful about this is not only that it is pretty simple, but that it is scalable. If you're into Hadoop, this logic would by very easy to use with MapReduce. However you want to set up your architecture, this algorithm sets you up well for scalability. So, now that we've got our logic all worked out, let's find the nearest two locations. var nearest = getNearest(currentLocation, locations, 2); And the contents of nearest will be as follows: [ { geoHash: "b0800pbh040n", latitude: 47.8557705, longitude: -181.9417665 }, { geoHash: "b0800pbh8hb5", latitude: 47.8558705, longitude: -181.9416665 } ] One final note. If you are trying to use this in node.js (like me), you can install this using npm (npm install geohash@latest). Have fun! LimitationsIt should be noted that there are limitations to using geohash for proximity searches. From wikipedia:
In the code above, I am not checking the eight surrounding geohashes for proximity searching, but instead relying only on size of the matching prefix. This approach is less accurate, but much more simple. See Also
|