Start with this:
[G|C] * [T] *
Write a program that generates this:
Cat
Cut
Cute
City <-- NOTE: this one is wrong, because City has an "ESS" sound at the start.
Caught
...
Gate
Gotti
Gut
...
Kit
Kite
Kate
Kata
Katie
Another Example, This:
[C] * [T] * [N]
Should produce this:
Cotton Kitten
Where should I start my research as I figure out how to write a program/script that does this?
You can do this by using regular expressions against a dictionary containing phonetic versions of words.
Here's an example in Javascript:
<html>
<head>
<title>Test</title>
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script>
<script>
$.get('cmudict0.3',function (data) {
matches = data.match(/^(\S*)\s+K.*\sT.*\sN$/mg);
$('body').html('<p>'+matches.join('<br/> ')+'</p>');
})
</script>
</head>
<body>
</body>
</html>
You'll need to download the list of all words from http://icon.shef.ac.uk/Moby/mpron.tar.Z and put it (uncompressed) in the same folder as the HTML file. I've only translated the [C] * [T] * [N] version into a regular expression and the output isn't very nice but it'll give you the idea. Here's a sample of the output:
CALTON K AE1 L T AH0 N
CAMPTON K AE1 M P T AH0 N
CANTEEN K AE0 N T IY1 N
CANTIN K AA0 N T IY1 N
CANTLIN K AE1 N T L IH0 N
CANTLON K AE1 N T L AH0 N
...
COTTERMAN K AA1 T ER0 M AH0 N
COTTMAN K AA1 T M AH0 N
COTTON K AA1 T AH0 N
COTTON(2) K AO1 T AH0 N
COULSTON K AW1 L S T AH0 N
COUNTDOWN K AW1 N T D AW2 N
..
KITSON K IH1 T S AH0 N
KITTELSON K IH1 T IH0 L S AH0 N
KITTEN K IH1 T AH0 N
KITTERMAN K IH1 T ER0 M AH0 N
KITTLESON K IH1 T L IH0 S AH0 N
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With