I'm tinkering with a domain name finder and want to favour those words which are easy to pronounce.
Example: nameoic.com (bad) versus namelet.com (good).
Was thinking something to do with soundex may be appropriate but it doesn't look like I can use them to produce some sort of comparative score.
PHP code for the win.
Here is a function which should work with the most common of words... It should give you a nice result between 1 (perfect pronounceability according to the rules) to 0.
The following function far from perfect (it doesn't quite like words like Tsunami [0.857]). But it should be fairly easy to tweak for your needs.
<?php
// Score: 1
echo pronounceability('namelet') . "\n";
// Score: 0.71428571428571
echo pronounceability('nameoic') . "\n";
function pronounceability($word) {
static $vowels = array
(
'a',
'e',
'i',
'o',
'u',
'y'
);
static $composites = array
(
'mm',
'll',
'th',
'ing'
);
if (!is_string($word)) return false;
// Remove non letters and put in lowercase
$word = preg_replace('/[^a-z]/i', '', $word);
$word = strtolower($word);
// Special case
if ($word == 'a') return 1;
$len = strlen($word);
// Let's not parse an empty string
if ($len == 0) return 0;
$score = 0;
$pos = 0;
while ($pos < $len) {
// Check if is allowed composites
foreach ($composites as $comp) {
$complen = strlen($comp);
if (($pos + $complen) < $len) {
$check = substr($word, $pos, $complen);
if ($check == $comp) {
$score += $complen;
$pos += $complen;
continue 2;
}
}
}
// Is it a vowel? If so, check if previous wasn't a vowel too.
if (in_array($word[$pos], $vowels)) {
if (($pos - 1) >= 0 && !in_array($word[$pos - 1], $vowels)) {
$score += 1;
$pos += 1;
continue;
}
} else { // Not a vowel, check if next one is, or if is end of word
if (($pos + 1) < $len && in_array($word[$pos + 1], $vowels)) {
$score += 2;
$pos += 2;
continue;
} elseif (($pos + 1) == $len) {
$score += 1;
break;
}
}
$pos += 1;
}
return $score / $len;
}
I think the problem could be boiled down to parsing the word into a candidate set of phonemes, then using a predetermined list of phoneme pairs to determine how pronouncible the word is.
For example: "skill" phonetically is "/s/k/i/l/". "/s/k/", "/k/i/", "/i/l/" should all have high scores of pronouncibility, so the word should score highly.
"skpit" phonetically is "/s/k/p/i/t/". "/k/p/" should have a low pronouncibility score, so the word should score low.
Use a Markov model (on letters, not words, of course). The probability of a word is a pretty good proxy for ease of pronunciation. You'll have to normalize for length, since longer words are inherently less probable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With