What is the best method to do location disambiguation for geonames data?
There are some scoring algorithm for geonames search, but they do not open source it and I'm not sure they are very sophisticated. (i.e. for soma, ca
it returns Soma lake in Canada
which haven't even wikipedia article, instead of very popular Soma Neirbohood in san francisco
)
There also some works I have found in google scholar, but they seems very shallow and similar with my heuristics like scoring by something(log(population) + 1000*hasWikipedia(article)+ isCity100+isCapital(10)
).
My domain in travel articles so my scoring function should provide most probable tourist places(cities, place of interest(Disneyland, colleseum, big ben)).
Do you know any important article in this field, or algorithms used in production by Google maps, yahoo, bing or even geonames?
@yura, this isn't what you're looking for, but I don't think any clever algorithm will be able to consistently disambiguate whether queries like "soma ca" refer to Soma in San Fran or Soma Lake in Canada. The problem is not that your algorithm is not sophisticated enough; the problem is that there is simply not enough information in the query "soma ca".
I don't know how to express it clearly, but there is an information theoretic thing going on here. It's like the way that random data can't be compressed losslessly: there's not enough information in the input to compute the desired output.
Even if a human was to interpret your queries manually, they would not necessarily understand that "soma ca" is supposed to mean Soma in SF. Maybe to you a 2-letter abbreviation like "ca" "naturally" refers to a US state rather than a foreign country, but there is nothing fundamentally "correct" about that choice, and it cannot be derived using pure logic. It's an arbitrary, domain-specific, ad-hoc rule, just like the ad-hoc log(population)
heuristic which you referred to.
Some possible "solutions" (aside from designing a telepathic computer which can read users' minds):
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With