Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how make function like IsWordPronounceable(SomeWord:String): boolean;

I would like to make a function IsWordPronounceable(SomeWord:String): boolean; "english language" i am working with SAPI Speech Recognition and i need this function. I use delphi compiler, C/C#/C++ or any language is ok.. please help. i dont know how to start...

from the start, i thought adding grammar rule could solve the problem. the scenario is highlight the text that is being said to the user. but the engine cannot recognize the words that is not pronounceble.

like image 501
XBasic3000 Avatar asked May 06 '26 22:05

XBasic3000


2 Answers

This is not exactly easy to do. The way I would do it is with some simple statistical analysis.

Start off by downloading a dictionary of English words (or any language, really - you just need a dictionary of words that are "pronounceable"). Then, take each word in the dictionary and break it up into 3-letter blocks. So given the word "dictionary", you'd break it up into "dic", "ict", "cti", "tio", "ion", "ona", "nar", and "ary". Then add each three-letter block from all the words in the dictionary into a collection that maps the three letter block to the number of times it appears. Something like this:

"dic" -> 36365
"ict" -> 2721
"cti" -> 532

And so on... Next, normalize the numbers by dividing each number by the total number of words in the dictionary. That way, you have a mapping of three-letter combinations to the percentage of words in the dictionary that contain that three letter combination.

Finally, implement your IsWordPronounceable method something like this:

bool IsWordPronounceable(string word)
{
    string[] threeLetterBlocks = BreakIntoThreeLetterBlocks(word);
    foreach(string block in threeLetterBlocks)
    {
        if (blockFrequency[block] < THRESHOLD)
            return false;
    }
    return true;
}

Obviously, there's a few parameters you'll want to "tune". The THRESHOLD parameter is one, also the size of the blocks might be better off being 2 or 3 or 4, etc. It'll take a bit of massaging around to get it right, I think.

like image 53
Dean Harding Avatar answered May 09 '26 12:05

Dean Harding


Just an idea (maybe crazy): I've never tried that.
Can you feed the output of the Text-To-Speech into the input of the Speech-To-Text?
Then in a perfect world, anything not recognized (or not matching) in the end is not pronounceable.

like image 32
Francesca Avatar answered May 09 '26 11:05

Francesca