Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android SpeechRecognizer "confidence" values are confusing

I'm using the SpeechRecognizer via Intent:

Intent i = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
i.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
        RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);

i.putExtra(RecognizerIntent.EXTRA_PROMPT,
        "straight talk please");

i.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 5);
i.putExtra(RecognizerIntent.EXTRA_LANGUAGE, 
            "en-US";

startActivityForResult(i, 0);

And I get the results in onActivityResults() like this:

protected void onActivityResult(int requestCode, int resultCode, Intent data) {

    if (requestCode == 0 && resultCode == RESULT_OK) {

        // List with the results from the Voice Recognition API
        ArrayList<String> results = data
                .getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);

        // The confidence array
        float[] confidence = data.getFloatArrayExtra(
                RecognizerIntent.EXTRA_CONFIDENCE_SCORES);

        // The confidence results       
        for (int i = 0; i < confidence.length; i++) {
            Log.v("oAR", "confidence[" + i + "] = " + confidence[i]);
        }
    }

    super.onActivityResult(requestCode, resultCode, data);
}

But the float array always returns 0.0 as result, but the first element like this:

confidence[0] = any value between 0 and 1
confidence[1] = 0.0
confidence[2] = 0.0
and so on

I would expect that every result has a confidence value between 0 and 1. Otherwise it seems pretty useless, because the result with the highest confidence will be the first element by default, without using the EXTRA_CONFIDENCE_SCORES. Is there something I'm missing?

Furthermore the RecognizerIntent.EXTRA_CONFIDENCE_SCORES is supposed to be used in API Level 14++. But it doesn't matter on which API above 8 I use it the result stays the same. Are the docs out of date in that point?

like image 925
Steve Benett Avatar asked Sep 12 '13 13:09

Steve Benett


3 Answers

According to my interpretation of the documentation:

recognizerIntent.Extra_Results returns an ordered arrayList of strings, each of which is one suggestion as to what was said, with the string at index 0 being the suggestion the Recognizer is most confident of.

recognizerIntent.Extra_Confidence_Scores returns an array of floats corresponding to each of these suggestions.

So, if the results you are getting are correct(otherwise this might be a bug), then the recognizer has 1, and only 1, suggestion that it has confidence in and several others that it has only negligible or no confidence in.

I've been getting similar results. I've never had a set of results in which more than one suggestion had non-negligible confidence, just like you. e.g. 0.7435, 0.0, 0.0, 0.0, ......

I have however sometimes gotten a set of results in which ALL results have negligible confidence. e.g. 0.0, 0.0, 0.0, 0.0, 0.0, ......

So yes the first element in Results will always be the one the Recognizer is most confident of.

like image 183
Owen Ryan Avatar answered Nov 15 '22 13:11

Owen Ryan


I haven't work with speech reorganization. But still, as you said you are getting float array value as 0.0, this implies float array is null . can you please check is the float[] is returning null or else.

ArrayList<String> results = data
            .getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);

float[] confidence = data.getFloatArrayExtra(
            RecognizerIntent.EXTRA_CONFIDENCE_SCORES);
if (confidence == null)
{
 for (int i = 0; i < results.size(); i++)
  {
   Log.d(TAG, i + ": " + results.get(i));
  }
}
else
{
   for (int i = 0; i < results.size(); i++)

   {
     Log.d(TAG, i + ": " + heard.get(i) + " confidence : "  + confidence[i]);
  }
}

Can you please check the book Professional Android Sensor Programming  By Greg Milette, Adam Stroud this will surely help you. You will get some details on page 394 on this book.

like image 30
Pradip Avatar answered Nov 15 '22 13:11

Pradip


The conventional speech recognition algorithm allows to return confidence of just 1-best result because it is the result compared with other results to calculate the confidence. It is also possible to return N best results instead of just 1-best, however, it is much harder to calculate confidence for them.

It seems that Google implemented the conventional approach only and reserved place in the API for more detailed results with n-best confidence.

You just have to wait for Google to implement everything properly.

like image 1
Nikolay Shmyrev Avatar answered Nov 15 '22 14:11

Nikolay Shmyrev