Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speech recognition using a real time stream

Firstly, in order to clarify my goal: I am using the CSCore library and capturing background audio using the WasapiLoopbackCapture class, and I intend to use that as a real time input for a System.Speech.Recognition recognition engine. That class either outputs the data to a .WAV file or to a Stream. I then tried doing this:

    private void startButton_Click(object sender, EventArgs e)
    {
        _recognitionEngine.UnloadAllGrammars();
        _recognitionEngine.LoadGrammar(new DictationGrammar());

        LoadTargetDevice();
        StartStreamCapture(); // Here I am starting the capture to _stream (MemoryStream type)

        _stream.Position = 0; // Without setting this, I get a stream format exception.

        _recognitionEngine.SetInputToWaveStream(_stream);
        _recognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    }

The result is that I don't get an exception but I also don't get the SpeechRecognized or SpeechDetected events firing. I suspect this is because the System.Speech.Recognition assembly does not support real time streams. I searched online and someone reports implementing a custom Stream type as a workaround, but I was unable to follow the instructions on the post which were unclear (see Dexter Morgan's reply here).

I am aware this problem is best solved by using a different library or an alternate approach, but I would like to know how to do this makeshift implementation specifically, mostly for knowledge purposes.

Thanks!

like image 975
Johnson Whitler Avatar asked Jan 22 '18 14:01

Johnson Whitler


People also ask

What is real time text speech?

Speech-to-text, also known as speech recognition, enables real-time or offline transcription of audio streams into text. For a full list of available speech-to-text languages, see Language and voice support for the Speech service. Microsoft uses the same recognition technology for Cortana and Office products.

What is a streaming speech?

Streaming speech recognition allows you to stream audio to Speech-to-Text and receive a stream speech recognition results in real time as the audio is processed. See also the audio limits for streaming speech recognition requests. Streaming speech recognition is available via gRPC only.

Which algorithm is best for speech recognition?

Dubbed the Recurrent Neural Network (RNN), these algorithms are ideal for sequential data like speech because they're able to “remember” what came before and use their previous output as input for their next move.


1 Answers

@Justcarty thanks for the clarification, here is my Explanation why Code of OP wont work and what need to be done in order to make it work.

In C# for the speech recongintion and synthesis , you probably confused by the documentation where we are having two Speech DLL's
1. Microsoft Speech DLL (Microsoft.speech.dll) 2. System Speech DLL (System.Speech.Dll)

System.speech dll is a part of the windows OS . The two libraries are similar in the sense that the APIs are almost, but not quite, the same. So, if you’re searching online for speech examples , from the code snippets you get you may not tell whether they explaining to System.Speech or Microsoft.Speech.

So for Adding a Speech to the C# application you need to use the Microsoft.Speech library, not the System.Speech library.

Some of the key differences are summarized belows

|-------------------------|---------------------|
|  Microsoft.Speech.dll    | System.Speech.dll  |
|-------------------------|---------------------|
|Must install separately  |                     |
|                         | Part of the OS      |
|                         |  (Windows Vista+)   |
|-------------------------|---------------------|
|Must construct Grammars  | Uses Grammars or    |
|                           free dictation      |
| ------------------------|--------------------|

For more Read the Following Article , it explains the correct way to implement

like image 112
Tummala Krishna Kishore Avatar answered Sep 21 '22 13:09

Tummala Krishna Kishore