Good post Bruce. I’ve run into a lot of the same issues you described, especially working with last-mile and delivery companies where the addresses are a total mess. Things like “123 Main St, red gate, call 555-1234 when you arrive, Apt 4B” are very common, and even Google struggles when there’s too much extra info mixed into the address.
Because of that, I ended up building a tool called AddressHub that focuses mostly on cleaning up the input before sending it to geocoders. It tries to remove instructions, phone numbers, weird formatting, and fixes common spelling mistakes or street name variations. Then it sends the cleaned address through multiple geocoders, checks how close the results are to where they should be, and picks the best one.
It also caches results when possible (depending on the API terms) and tries open data first to save costs. Only if open data does not have a good result, it falls back to the paid geocoders.
This has helped a lot with reducing wrong deliveries caused by addresses that look fine at first glance but are actually way off. If anyone’s interested, the project is at www.address-hub.com. Always happy to get feedback if people have ideas for making it better.
Good post, however what most people fail to see is that the quality of the result from a geocoder greatly depends on the quality of the input data.
Google has good normalization of that input address to identify and correct it but if the input has too much info like instructions to find the address or variations of the street name, phone numbers etc, then it gets confused and the results will not be good.
I deal with lots of very poorly formatted addresses for last mile and delivery companies and they have horrible data, so I created AddressHub which mostly focus on address normalization but also offers a router capability that connects to multiple geocoders and analyze the results to prevent false positives (results that are deemed of high accuracy but are even miles away from the real location).
It also provides caching according to the terms and conditions of the geocoders and also checks first with open data to see if the address can be geocoded with it (at no cost) then do a fallback mechanism with the geocoders to find the best result.
Take a look at it and let me know what I should improve: www.address-hub.com
Great article. I used to work at a gig economy leader in Europe but I couldn't take it anymore when understood how broken it is for the service providers, then I created an alternative: http://www.eaziapp.com/en, it's a platform to empower people to become lifestyle concierges, they offer the same services they are currently offering in gig platforms but with less commissions and more a focus on running their own business.
We support cleaning at the moment but are working on an MVP for grocery shopping, the whole idea is to allow the lifestyle concierge buy groceries for the customer they are going to clean (they need to go there anyways) and this way the concierge makes more money with almost the same effort
Because of that, I ended up building a tool called AddressHub that focuses mostly on cleaning up the input before sending it to geocoders. It tries to remove instructions, phone numbers, weird formatting, and fixes common spelling mistakes or street name variations. Then it sends the cleaned address through multiple geocoders, checks how close the results are to where they should be, and picks the best one.
It also caches results when possible (depending on the API terms) and tries open data first to save costs. Only if open data does not have a good result, it falls back to the paid geocoders.
This has helped a lot with reducing wrong deliveries caused by addresses that look fine at first glance but are actually way off. If anyone’s interested, the project is at www.address-hub.com. Always happy to get feedback if people have ideas for making it better.