Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the MS Speech Platform 11 Recognizer support ARPA compiled grammars?

How can I use ARPA files with MS Speech? The documentation for the Microsoft Speech Platform 11 Recognizer implies that one can compile a grammar from an ARPA file.

I am able to compile an ARPA file -- for instance, the tiny example provided by Microsoft -- using the following command line:

CompileGrammar.exe -In stock.arpa -InFormat ARPA

I'm able to use the resulting CFG file in the following test:

using Microsoft.Speech.Recognition;

// ...

using (var engine = new SpeechRecognitionEngine(new CultureInfo("en-US")))
{
    engine.LoadGrammar(new Grammar("stock.cfg"));
    var result = engine.EmulateRecognize("will stock go up");
    Assert.That(result, Is.Not.Null);
}

This test passes, but note that it uses EmulateRecognize(). When I switch to using an actual audio file, like this:

using (var engine = new SpeechRecognitionEngine(new CultureInfo("en-US"))) 
{
    engine.LoadGrammar(new Grammar("stock.cfg"));
    engine.SetInputToWaveFile("go-up.wav");
    var result = engine.Recognize();
}

result is always null and the test fails.

Microsoft states quite clearly that it's supported, yet even very simple examples don't seem to work. What am I doing wrong?

like image 447
ladenedge Avatar asked Sep 18 '18 14:09

ladenedge


2 Answers

For your question:

Does the MS Speech Platform 11 Recognizer support ARPA compiled grammars?

The answer is Yes.

Worked code on my side,just change the three properties: Culture/Grammar/WaveFile. I am not know your full code, but based on my test and the demo code, I guess the root cause is we need handle the SpeechRecognized in our side, which you maybe have not done on your side.

static bool completed;

        static void Main(string[] args)  
        {
            // Initialize an in-process speech recognition engine.  
            using (SpeechRecognitionEngine recognizer =
               new SpeechRecognitionEngine(new CultureInfo("en-us")))
            {

                // Create and load a grammar.   
                Grammar dictation = new Grammar("stock.cfg");
                dictation.Name = "Dictation Grammar";

                recognizer.LoadGrammar(dictation);

                // Configure the input to the recognizer.  
                recognizer.SetInputToWaveFile("test.wav");

                // Attach event handlers for the results of recognition.  
                recognizer.SpeechRecognized +=
                  new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
                recognizer.RecognizeCompleted +=
                  new EventHandler<RecognizeCompletedEventArgs>(recognizer_RecognizeCompleted);

                // Perform recognition on the entire file.  
                Console.WriteLine("Starting asynchronous recognition...");
                completed = false;
                recognizer.RecognizeAsync();

                // Keep the console window open.  
                while (!completed)
                {
                    Console.ReadLine();
                }
                Console.WriteLine("Done.");
            }

            Console.WriteLine();
            Console.WriteLine("Press any key to exit...");
            Console.ReadKey();
        }

        // Handle the SpeechRecognized event.  
        static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            if (e.Result != null && e.Result.Text != null)
            {
                Console.WriteLine("  Recognized text =  {0}", e.Result.Text);
            }
            else
            {
                Console.WriteLine("  Recognized text not available.");
            }
        }

        // Handle the RecognizeCompleted event.  
        static void recognizer_RecognizeCompleted(object sender, RecognizeCompletedEventArgs e)
        {
            if (e.Error != null)
            {
                Console.WriteLine("  Error encountered, {0}: {1}",
                e.Error.GetType().Name, e.Error.Message);
            }
            if (e.Cancelled)
            {
                Console.WriteLine("  Operation cancelled.");
            }
            if (e.InputStreamEnded)
            {
                Console.WriteLine("  End of stream encountered.");
            }
            completed = true;
        }

enter image description here enter image description here

And the content of the wav is just "will stock go up"(The duration is about 2 seconds).

For more infomation: https://docs.microsoft.com/en-us/dotnet/api/system.speech.recognition.speechrecognitionengine.setinputtowavefile?redirectedfrom=MSDN&view=netframework-4.7.2#System_Speech_Recognition_SpeechRecognitionEngine_SetInputToWaveFile_System_String_

like image 69
seiya1223 Avatar answered Sep 18 '22 09:09

seiya1223


There are two different answers to this question depending on which version of the Microsoft Speech SDK you're using. (See: What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? )

System.Speech (Desktop Version)

In this case, see seiya1223's answer. The sample code there works great.

Microsoft.Speech (Server Version)

Perhaps because the server version does not include the "dictation engine," the Microsoft.Speech library will apparently never match an ARPA-sourced CFG. However, it will still hypothesize what was said via the SpeechRecognitionRejected event. Here are the necessary changes from seiya1223's desktop code:

  1. Change your using statement from System.Speech to Microsoft.Speech, of course.
  2. Add an event handler for the SpeechRecognitionRejected event.
  3. In your event handler, examine the e.Result.Text property for the final hypothesis.

The following snippet should help illustrate:

static string transcription;

static void Main(string[] args)  
{
  using (var recognizer = new SpeechRecognitionEngine(new CultureInfo("en-us")))
  {
    engine.SpeechRecognitionRejected += SpeechRecognitionRejectedHandler;
    // ...
  }
}

void SpeechRecognitionRejectedHandler(object sender, SpeechRecognitionRejectedEventArgs e)
{
  if (e.Result != null && !string.IsNullOrEmpty(e.Result.Text))
    transcription = e.Result.Text;
}

This handler is called once at the end of recognition. For example, here is the output from seiya1223's code, but using all of the available event handlers and a bunch of extra logging (emphasis mine):

Starting asynchronous recognition...
In SpeechDetectedHandler:
- AudioPosition = 00:00:01.2300000
In SpeechHypothesizedHandler:
- Grammar Name = Stock; Result Text = Go
In SpeechHypothesizedHandler:
- Grammar Name = Stock; Result Text = will
In SpeechHypothesizedHandler:
- Grammar Name = Stock; Result Text = will Stock
In SpeechHypothesizedHandler:
- Grammar Name = Stock; Result Text = will Stock Go
In SpeechHypothesizedHandler:
- Grammar Name = Stock; Result Text = will Stock Go Up
In SpeechRecognitionRejectedHandler:
- Grammar Name = Stock; Result Text = will Stock Go Up

In RecognizeCompletedHandler.
- AudioPosition = 00:00:03.2000000; InputStreamEnded = True
- No result.
Done.

like image 24
ladenedge Avatar answered Sep 20 '22 09:09

ladenedge