Say you want to take CMU's phonetic data set input that looks like this:
ABERRATION AE2 B ER0 EY1 SH AH0 N
ABERRATIONAL AE2 B ER0 EY1 SH AH0 N AH0 L
ABERRATIONS AE2 B ER0 EY1 SH AH0 N Z
ABERT AE1 B ER0 T
ABET AH0 B EH1 T
ABETTED AH0 B EH1 T IH0 D
ABETTING AH0 B EH1 T IH0 NG
ABEX EY1 B EH0 K S
ABEYANCE AH0 B EY1 AH0 N S
(The word is to the left, to the right are a series of phonemes, key here)
And you want to use it as training data for a machine learning system that would take new words and guess how they would be pronounced in English.
It's not so obvious to me at least because there isn't a fixed token size of letters which could possible map to a phoneme. I have a feeling that something to do with a markov chain might be the right way to go.
How would you do this?
ELSA Speak Free LanguagesELSA Speak has a free version you can use without paying. The free version limits many of the features you can use. You have access to a few lessons, but don't have access to any of the tools in the Pro version, such as an analysis of your speaking score.
FluentU (Android/iOS) Not only do you get to see the language used naturally, but you also get to hear native pronunciations of every word. Anywhere that you see a word, you're able to check its definition, hear an audio pronunciation and see the word used in other videos.
The problem is called Grapheme-to-phoneme conversion, a subproblem of Natural Language Processing. Google brings up a few papers.
Not entirely my field, but maybe build a neural network with several layers - earlier layers to guess the splitting of the words into sequential syllables, the later layers to guess the pronounciation of the said syllables.
Setting up a ANFIS-learning neural network is fairly straightforward for numerical data, for literal/phonetic data the task is undoubtedly several orders more complex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With