Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speech to Phoneme in .Net

The problem is that I want to get phonemes of a audio speech in C# language. say you have an audio file like "x.wav" that says "hello dear Shamim". i want to extract all the phonemes of the speech and their relative timings. something like the picture below:

Phoneme Editor

I used System.Speech library (both recognition and synthesis namespaces) but i didn't find what i wanted. Now don't be mistaken! I don't want the phonemes of the sentence "hello dear Shamim", i want to extract the phonemes from an unknown audio input that speaks and English sentence. I tried System.Speech.Recognition but it tries to extract the words out of the audio file, not the phonems! and as you may guessed, the words are 30% wrong! ;)

like image 515
Shamim Avatar asked Dec 25 '13 08:12

Shamim


People also ask

What do you mean by speech synthesis?

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products.

What is System speech?

The Windows Desktop Speech Technology software offers a basic speech recognition infrastructure that digitizes acoustical signals, and recovers words and speech elements from audio input. Applications use the System.


2 Answers

Phoneme recognition requires a bit of a specialized set-up compared to word recognition, and most engines don't support it directly (a dictionary of monophonic "words" doesn't usually result in good accuracy). A big reason for that is that phoneme recognition is much less accurate than word recognition, since word recognition is more constrained (it filters out all phone combinations which don't map to real words, which is most of them). But HTK does support it. You can use it by executing shell commands (there's nothing evil in doing that from C#) or pinvoking the libraries.

like image 100
Aleksandr Dubinsky Avatar answered Sep 23 '22 15:09

Aleksandr Dubinsky


Try using the System.Speech.Recognition.DictationGrammar constructor that takes a string argument, and pass "grammar:dictation#pronunciation" as the argument. Alternatively, raw SAPI (using the SpeechLib interop assembly) can specify the pronunciation grammar via ISpRecoGrammar::LoadDictation and specifying "Pronunciation" as the dictation topic.

like image 34
Eric Brown Avatar answered Sep 26 '22 15:09

Eric Brown