Firstly, in order to clarify my goal: I am using the CSCore library and capturing background audio using the WasapiLoopbackCapture
class, and I intend to use that as a real time input for a System.Speech.Recognition
recognition engine. That class either outputs the data to a .WAV file or to a Stream. I then tried doing this:
private void startButton_Click(object sender, EventArgs e)
{
_recognitionEngine.UnloadAllGrammars();
_recognitionEngine.LoadGrammar(new DictationGrammar());
LoadTargetDevice();
StartStreamCapture(); // Here I am starting the capture to _stream (MemoryStream type)
_stream.Position = 0; // Without setting this, I get a stream format exception.
_recognitionEngine.SetInputToWaveStream(_stream);
_recognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
}
The result is that I don't get an exception but I also don't get the SpeechRecognized
or SpeechDetected
events firing. I suspect this is because the System.Speech.Recognition
assembly does not support real time streams. I searched online and someone reports implementing a custom Stream
type as a workaround, but I was unable to follow the instructions on the post which were unclear (see Dexter Morgan's reply here).
I am aware this problem is best solved by using a different library or an alternate approach, but I would like to know how to do this makeshift implementation specifically, mostly for knowledge purposes.
Thanks!
Speech-to-text, also known as speech recognition, enables real-time or offline transcription of audio streams into text. For a full list of available speech-to-text languages, see Language and voice support for the Speech service. Microsoft uses the same recognition technology for Cortana and Office products.
Streaming speech recognition allows you to stream audio to Speech-to-Text and receive a stream speech recognition results in real time as the audio is processed. See also the audio limits for streaming speech recognition requests. Streaming speech recognition is available via gRPC only.
Dubbed the Recurrent Neural Network (RNN), these algorithms are ideal for sequential data like speech because they're able to “remember” what came before and use their previous output as input for their next move.
@Justcarty thanks for the clarification, here is my Explanation why Code of OP wont work and what need to be done in order to make it work.
In C# for the speech recongintion and synthesis , you probably confused by the documentation where we are having two Speech DLL's
1. Microsoft Speech DLL (Microsoft.speech.dll)
2. System Speech DLL (System.Speech.Dll)
System.speech dll is a part of the windows OS . The two libraries are similar in the sense that the APIs are almost, but not quite, the same. So, if you’re searching online for speech examples , from the code snippets you get you may not tell whether they explaining to System.Speech
or Microsoft.Speech
.
So for Adding a Speech to the C# application you need to use the Microsoft.Speech library
, not the System.Speech library
.
Some of the key differences are summarized belows
|-------------------------|---------------------|
| Microsoft.Speech.dll | System.Speech.dll |
|-------------------------|---------------------|
|Must install separately | |
| | Part of the OS |
| | (Windows Vista+) |
|-------------------------|---------------------|
|Must construct Grammars | Uses Grammars or |
| free dictation |
| ------------------------|--------------------|
For more Read the Following Article , it explains the correct way to implement
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With