Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SpeechRecognition recognizes background noise as speech

I'm using MSDN's SpeechRecognitionEngine in my program. The problem is that it recognizes background noise as speech.

For example, If snap my fingers, tap the table, or move my chair, it picks that up as speech.

Why in the world dose it recognize background noise as speech.

Me snapping my fingers dose not sound the same as me saying "Notepad"!!!

Here is the code

using System;
using System.Threading;
using System.Speech;
using System.Speech.Synthesis;
using System.Speech.Recognition;

namespace SpeachTest
{
    public class MainClass
    {
        static void Main()
        {
        MainClass main = new MainClass(); 
        SpeechRecognitionEngine sre = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
            Choices choiceList = new Choices();
            choiceList.Add(new string[]{"Open", "Close", "Then", "Volume", "Up", "Firefox", "Notepad", "Steam","turn", "the", "now" } );

            GrammarBuilder builder = new GrammarBuilder();
            builder.Append(choiceList);
            Grammar grammar = new Grammar(new GrammarBuilder(builder,0, 10) );

            sre.SpeechRecognized += main.sreRecognizedEvent;
            sre.SpeechDetected += main.sreDetectEvent;
            sre.SpeechRecognitionRejected += main.sreRejectEvent;
            sre.RecognizeCompleted += main.sreCompleteEvent;

            sre.InitialSilenceTimeout = TimeSpan.FromSeconds(0);
            sre.BabbleTimeout = TimeSpan.FromSeconds(0);
            sre.EndSilenceTimeout = TimeSpan.FromSeconds(0);
            sre.EndSilenceTimeoutAmbiguous = TimeSpan.FromSeconds(0);


            sre.SetInputToDefaultAudioDevice();
            sre.LoadGrammar(grammar);

            while(true){
            sre.Recognize();
            }
        }


        void sreRecognizedEvent(Object sender, SpeechRecognizedEventArgs e){
        Console.Write("Reconized ~ " + e.Result.Text + " ~ with confidence " + e.Result.Confidence);
        Console.WriteLine();
        }


        void sreDetectEvent(Object sender, SpeechDetectedEventArgs e){
        Console.WriteLine("Detected some type of input");
        }

        void sreRejectEvent(Object sender,  SpeechRecognitionRejectedEventArgs e){
        Console.WriteLine("Rejected Input ~ " + e.Result.Text) ;
        }

        void sreCompleteEvent(Object sender, System.Speech.Recognition.RecognizeCompletedEventArgs e){
        Console.WriteLine("Completed Recongnization");
        }
}

}
like image 201
JackBarn Avatar asked Feb 16 '15 16:02

JackBarn


3 Answers

Avoiding any filtering algorithms, you can check the Confidence property that you're displaying at the moment. It ranges between 0.0 and 1.0 where 1 is very confident. I find 0.7 works well but you can mess around with trial and error.

void sreRecognizedEvent(Object sender, SpeechRecognizedEventArgs e)
{
    if(e.Result.Confidence >= 0.7)
    {
        Console.Write("Reconized ~ " + e.Result.Text + " ~ with confidence " + e.Result.Confidence);
        Console.WriteLine();
    }        
}
like image 55
keyboardP Avatar answered Oct 17 '22 13:10

keyboardP


What and how strongly nonspeech sounds are rejected by a recognizer vary greatly from recognizer to recognizer. My experience with the Microsoft recognizer is that it tries very hard to find words. For example, with DragonDictate or Google web recognition you can snap your fingers or cough and they are rejected. Also the Microsoft recognizer aggressively tracks the audio level, so if it listens to a lot of quiet it will internally simulate increasing the gain by scaling down the detection thresholds. (I have experienced it recognizing the rustling of papers or the sound of the air-conditioning as human speech.)

A solution I have used for many years with great success is somewhat counter-intuitive. You need to add your own "garbage" speech model. Since you are just using a list of words and not a sophisticated grammar this should work well and be easy to do.

You are currently listening for: "Open", "Close", "Then", "Volume", "Up", "Firefox", "Notepad", "Steam","turn", "the", "now"

You should add to the list (that you are listening for) some words that are somewhat (but not too) similar. For instance adding "apron" and "happen" will effectively be honey traps in the vicinity of the word "open". You can trust more that the person actually said "open" when it shows up as a result. Additionally, adding a number of short words that have nothing to do with your command words will trap more nonspeech sounds. I suspect that "tap" is likely to be recognized when you snap your fingers.

To summarize: Recognize this longer list of words, but only act on them if they are in your list of commands. If you are using a case statement in your code then this is absurdly simple, only branch on your commands. Otherwise you need to test against the "good" list.

Note: This technique also works when you are doing more complex recognition using a speech recognition grammar. You just put all these "garbage" phrases under a grammar rule named "garbage" and you can reject any utterance that was recognized by that rule.

like image 26
industrialpoet Avatar answered Oct 17 '22 15:10

industrialpoet


Turns out my microphone sensitivity was too high. very, very high to be exact. It was at 100, meaning that it would pick up the smallest sounds(such as background noise).

My guess is that those small sounds would be amplified to such a high degree that the SpeechRecognitionEngine would have difficulty differentiating it form actual speech.

Turning the sensitivity down to around 20 or 30 did the trick. enter image description hereA more appropriate sensitivity

like image 29
JackBarn Avatar answered Oct 17 '22 15:10

JackBarn