Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speech recognition API duplicated phrases on Android

I found, that speech recognition API duplicates result phrases on my Android (and does not duplicate on desktop).

For each phrase said, it returns two results. First one is

enter image description here

and the second one is

enter image description here

As you see, in the second return, phrase is duplicated, each copy is marked as final and second one is beyond resultIndex. In first return there is only one copy, it is final and it is beyond resultIndex.

I would take only second return, but the problem is that it happens on mobile Chrome, but does not happen on desktop Chrome. Desktop Chrome returns only first return.

So, the question is: is this by design behavior? Then how to distinguish single final phrase then commonly for all computers?

Or may be this is some error like sound echo, then the question is how to avoid/check echo?

UPDATE

Html is follows:

<input id="recbutton" type="button" value="Recognize">
<div id="output">

  <div>
    Initial text
  </div>

</div>

Code is follows:

var recognition = null;
var recognitionStarted = false;
var printcount = 1;
var lastPhrase = null;

$(function() {
  attachRecognition();
});

$('#recbutton').click( function() {
    if( !recognitionStarted ) {
    recognition.start();
  }
  else {
    recognition.stop();
  }
});

function printOut(text) {
    var id = 'printcount' + printcount;
  printcount++;

    $('#output').append(
    "<div id='" + printcount + "'>" + text + "</div>"
  );

    $("#output").animate({ scrollTop: $("#output").prop('scrollHeight')});

  return printcount;

}


function attachRecognition() {

  if (!('webkitSpeechRecognition' in window)) {

    $('button').prop('disabled', true);

    recognition = null;

  } else {
    $('button').prop('disabled', false);

    recognition = new webkitSpeechRecognition();

    recognition.continuous = true;
    recognition.interimResults = true;
    recognition.lang = "en-US";

    recognition.onstart = function(event) {
      recognitionStarted = true;
      printOut("speech recognition started");
    };

    recognition.onend = function(event) {
            recognitionStarted = false;
            printOut("speech recognition stopped");
    };

    recognition.onresult = function(event) {

      var finalPhrase = '';
      var interimPhrase = '';
      var result;
      var printcount;

      for(var i=0; i<event.results.length; ++i) {
        result = event.results[i];
        if( result.isFinal ) {
          finalPhrase = finalPhrase.trim() + ' ' + result[0].transcript;
        }
        else {
          interimPhrase = interimPhrase.trim() + ' ' + result[0].transcript;
        }
      }

      if( !lastPhrase ) {
        printcount = printOut('');
        lastPhrase = $('#' + printcount);
      }

      lastPhrase.html(finalPhrase.trim() + ' ' + interimPhrase.trim());

      if( finalPhrase.trim() ) {
        lastPhrase = null;
      }


    };
  }
}

JsFiddle: https://jsfiddle.net/dimskraft/envwao8o/1/

like image 460
Dims Avatar asked Jan 31 '16 10:01

Dims


People also ask

Why is my talk to Text duplicating?

Thankfully, it's fairly easy to remove the duplicate. Head to the Settings app, tap “System,” then “Languages & input,” and “On-screen keyboard.” On this page, tap “Manage on-screen keyboards,” then toggle “Google Voice Typing [Legacy]” to off.

What is Android speech API?

The Android Speech API provides recognition control, background services, intents, and support for multiple languages. Again, it can look like a simple addition to the user input for your apps, but it's a very powerful feature that makes them stand out.


1 Answers

The results provided on Chrome mobile regarding the result.isFinal property seem to have a bug or in any case to differ from the ones on Chrome desktop. A possible workaround is to check the confidence attribute of the (first) alternative:

onResultHandler(event) {
    let i = event.resultIndex;
    let result = event.results[i];
    let isFinal = result.isFinal && (result[0].confidence > 0);
}

It also looks like that sometimes the final result is emitted twice (with the same confidence value), in that case you may want to debounce it or just process the first event, like this:

if (isFinal) {
    transcript = result[0].transcript;

    if(transcript == lastDebounceTranscript) {
        return;
    }

    lastDebounceTranscript = transcript;

}

where lastDebounceTranscript is a variable that you initialize outside of the scope of the event handler

like image 142
u.dev Avatar answered Sep 23 '22 18:09

u.dev