Is there a library in python that can convert words (mainly names) to Arpabet phonetic transcription?
BARBELS -> B AA1 R B AH0 L Z
BARBEQUE -> B AA1 R B IH0 K Y UW2
BARBEQUED -> B AA1 R B IH0 K Y UW2 D
BARBEQUEING -> B AA1 R B IH0 K Y UW2 IH0 NG
BARBEQUES -> B AA1 R B IH0 K Y UW2 Z
You can use a tiny utility from my listener project to do this. It uses espeak under the covers (to generate IPA), then uses a mapping extracted from the CMU dictionary to produce the set of ARPABet mappings that could match the IPA generated, for instance:
$ listener-arpa
we are testing
we
W IY
are
ER
AA
testing
T EH S T IH NG
That produces exact-matches on the CMU dictionary about 45% of the time (I got around 36% using the documented correspondence in CMU/Wikipedia) while producing ~3 matches per word (on average). That said, we see a "close match" about 99% of the time, that is, while we might not precisely match the hand-marked-up word every time, we are generally off by only a few phonemes.
$ sudo apt-get install espeak
$ pip install -e git+https://github.com/mcfletch/listener.git#egg=listener
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With