Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how could I make a search match for similar words

I'm working trying to automatically categorize short articles and I'm trying to figure out how to match similar words - eg, shelf shelves or painting and repaint

I'm using the Porter stemming algorithm but it only helps for certain situations and only with the end of the word (both examples above don't work with it).

Is there an algorithm or related word lists that would help with something like this (outside of making my own?)

(I'm working in php so any solutions in that language would be more helpful.)

like image 327
Yehosef Avatar asked Oct 31 '10 16:10

Yehosef


1 Answers

The Levenshtein Distance is what you are looking for.

For any two strings, it calculates the minimum number of insertions, mutations and deletions that need to occur to changes one string to the other.

If the distance is low then the two words are similar.

You could also use the Soundex algorithm to determine if two words sound similar.

See also:
PHP levenshtein function
PHP soundex function

like image 114
Peter Alexander Avatar answered Nov 16 '22 03:11

Peter Alexander