Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I check if a string can be pronounced?

I would like to programmatically check whether a string can be pronounced or needs to be spelled out.

For example, internationalization can be read out, but i18n cannot, nor can hhdirgxzf.

I can think of some simple heuristics such as checking whether the string contains non-alpha characters, but I hope there is a more robust and scientific way to do it. Are there algorithmic approaches that can score a string based on how easy it is to pronounce?

Related: Is there a way to rank the difficulty of pronunciation of a word?, however I don't have a list and I can't precompute.


Update based on comments.

  • As I'm an English speaker I'm interested in English but I could imagine an algorithm that was based on the way sound and speaking works rather than the characteristics of a particular language.
  • By pronounced I mean the string can be read out naturally, it's possible to pronounce hhdirgxzf but it would not sound one natural language word, it would need to be broken up.
  • a specific use case I have in mind is where I am sent strings, and I want to use a basic text-to-speech system to read them out loud. I want to determine which tokens in the string to let the TTS system try to pronounce, and which to make it spell out, erring on the side of spelling out if not confident.
like image 808
brabster Avatar asked Aug 29 '12 10:08

brabster


1 Answers

You might have some success by first splitting the word into syllables. This question on SO might help. Of course, this will only work for languages which, like English, use an alphabet which includes letters and whose letters include vowel sounds.

like image 144
High Performance Mark Avatar answered Sep 19 '22 13:09

High Performance Mark