Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OCR: weighted Levenshtein distance

I'm trying to create an optical character recognition system with the dictionary.

In fact I don't have an implemented dictionary yet=)

I've heard that there are simple metrics based on Levenstein distance which take in account different distance between different symbols. E.g. 'N' and 'H' are very close to each other and d("THEATRE", "TNEATRE") should be less than d("THEATRE", "TOEATRE") which is impossible using basic Levenstein distance.

Could you help me locating such metric, please.

like image 708
leshka Avatar asked May 21 '11 09:05

leshka


1 Answers

This might be what you are looking for: http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance (and kindly some working code is included in the link)

Update:

http://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html

like image 106
satnhak Avatar answered Oct 12 '22 14:10

satnhak