I want to code the Metaphone 3 algorithm myself. Is there a description? I know the source code is available for sale but that is not what I am looking for.
Since the author (Lawrence Philips) decided to commercialize the algorithm itself it is more than likely that you will not find description. The good place to ask would be the mailing list: https://lists.sourceforge.net/lists/listinfo/aspell-metaphone
but you can also checkout source code (i.e. the code comments) in order to understand how algorithm works: http://code.google.com/p/google-refine/source/browse/trunk/main/src/com/google/refine/clustering/binning/Metaphone3.java?r=2029
The link by @Bo now refers to (now defucnt) project entire source code.
Hence here is the new link with direct link to Source code for Metaphone 3 https://searchcode.com/codesearch/view/2366000/
by Lawrence Philips
Metaphone 3 is designed to return an approximate phonetic key (and an alternate * approximate phonetic key when appropriate) that should be the same for English * words, and most names familiar in the United States, that are pronounced similarly. * The key value is not intended to be an exact phonetic, or even phonemic, * representation of the word. This is because a certain degree of 'fuzziness' has * proven to be useful in compensating for variations in pronunciation, as well as * misheard pronunciations. For example, although americans are not usually aware of it, * the letter 's' is normally pronounced 'z' at the end of words such as "sounds".
The 'approximate' aspect of the encoding is implemented according to the following rules:
* * (1) All vowels are encoded to the same value - 'A'. If the parameter encodeVowels * is set to false, only initial vowels will be encoded at all. If encodeVowels is set * to true, 'A' will be encoded at all places in the word that any vowels are normally * pronounced. 'W' as well as 'Y' are treated as vowels. Although there are differences in * the pronunciation of 'W' and 'Y' in different circumstances that lead to their being * classified as vowels under some circumstances and as consonants in others, for the purposes * of the 'fuzziness' component of the Soundex and Metaphone family of algorithms they will * be always be treated here as vowels.
* * (2) Voiced and un-voiced consonant pairs are mapped to the same encoded value. This means that:
* 'D' and 'T' -> 'T'
* 'B' and 'P' -> 'P'
* 'G' and 'K' -> 'K'
* 'Z' and 'S' -> 'S'
* 'V' and 'F' -> 'F'
* * - In addition to the above voiced/unvoiced rules, 'CH' and 'SH' -> 'X', where 'X' * represents the "-SH-" and "-CH-" sounds in Metaphone 3 encoding.
Actually Metaphone3 is an algorithm with many very specific rules being a result of some test cases analysis. So it's not only a pure algorithm but it comes with extra domain knowledge. To obtain these knowledge and specific rules the author needed to put in a great effort. That's why this algorithm is not open-source.
There is an alternative anyway which is open-source: Double Metaphone. See here: https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/DoubleMetaphone.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With