Word distance algorithm for OCR

Question

I am working with an OCR output and I'm searching for special words inside it.

As the output is not clean, I look for elements that match my inputs according to a word distance lower than a specific threshold.

However, I feel that the Levenshtein distance or the Hamming distance are not the best way, as the OCR always seem to make the same mistakes: I for 1, 0 for O, Q for O... and these "classic" mistakes seem to be less important than "A for K" for instance. As a result, these distances do not care of the amount of differences of the appearances of the characters (low / high).

Is there any word distance algorithm that was made specifically for OCR that I can use that would better fit my case? Or should I implement my custom word distance empirically according to the visual differences of characters?

Admin · Accepted Answer

The Levenshtein distance allows you to specify different costs for every substitution pair (http://en.wikipedia.org/wiki/Levenshtein_distance#Possible_modifications, fifth item). So you can tune it to your needs by giving more or less emphasis to the common mistakes.

Word distance algorithm for OCR

Tags:

algorithm

levenshtein-distance

ocr

hamming-distance

zenbeni

1 Answers

Recent Activity

Donate For Us

Word distance algorithm for OCR

Tags:

algorithm

levenshtein-distance

ocr

hamming-distance

zenbeni

1 Answers

Related questions

Recent Activity

Donate For Us