I found, that speech recognition API duplicates result phrases on my Android (and does not duplicate on desktop).
For each phrase said, it returns two results. First one is
and the second one is
As you see, in the second return, phrase is duplicated, each copy is marked as final
and second one is beyond resultIndex
. In first return there is only one copy, it is final
and it is beyond resultIndex
.
I would take only second return, but the problem is that it happens on mobile Chrome, but does not happen on desktop Chrome
. Desktop Chrome
returns only first return.
So, the question is: is this by design behavior? Then how to distinguish single final phrase then commonly for all computers?
Or may be this is some error like sound echo, then the question is how to avoid/check echo?
UPDATE
Html is follows:
<input id="recbutton" type="button" value="Recognize">
<div id="output">
<div>
Initial text
</div>
</div>
Code is follows:
var recognition = null;
var recognitionStarted = false;
var printcount = 1;
var lastPhrase = null;
$(function() {
attachRecognition();
});
$('#recbutton').click( function() {
if( !recognitionStarted ) {
recognition.start();
}
else {
recognition.stop();
}
});
function printOut(text) {
var id = 'printcount' + printcount;
printcount++;
$('#output').append(
"<div id='" + printcount + "'>" + text + "</div>"
);
$("#output").animate({ scrollTop: $("#output").prop('scrollHeight')});
return printcount;
}
function attachRecognition() {
if (!('webkitSpeechRecognition' in window)) {
$('button').prop('disabled', true);
recognition = null;
} else {
$('button').prop('disabled', false);
recognition = new webkitSpeechRecognition();
recognition.continuous = true;
recognition.interimResults = true;
recognition.lang = "en-US";
recognition.onstart = function(event) {
recognitionStarted = true;
printOut("speech recognition started");
};
recognition.onend = function(event) {
recognitionStarted = false;
printOut("speech recognition stopped");
};
recognition.onresult = function(event) {
var finalPhrase = '';
var interimPhrase = '';
var result;
var printcount;
for(var i=0; i<event.results.length; ++i) {
result = event.results[i];
if( result.isFinal ) {
finalPhrase = finalPhrase.trim() + ' ' + result[0].transcript;
}
else {
interimPhrase = interimPhrase.trim() + ' ' + result[0].transcript;
}
}
if( !lastPhrase ) {
printcount = printOut('');
lastPhrase = $('#' + printcount);
}
lastPhrase.html(finalPhrase.trim() + ' ' + interimPhrase.trim());
if( finalPhrase.trim() ) {
lastPhrase = null;
}
};
}
}
JsFiddle: https://jsfiddle.net/dimskraft/envwao8o/1/
Thankfully, it's fairly easy to remove the duplicate. Head to the Settings app, tap “System,” then “Languages & input,” and “On-screen keyboard.” On this page, tap “Manage on-screen keyboards,” then toggle “Google Voice Typing [Legacy]” to off.
The Android Speech API provides recognition control, background services, intents, and support for multiple languages. Again, it can look like a simple addition to the user input for your apps, but it's a very powerful feature that makes them stand out.
The results provided on Chrome mobile regarding the result.isFinal
property seem to have a bug or in any case to differ from the ones on Chrome desktop. A possible workaround is to check the confidence attribute of the (first) alternative:
onResultHandler(event) {
let i = event.resultIndex;
let result = event.results[i];
let isFinal = result.isFinal && (result[0].confidence > 0);
}
It also looks like that sometimes the final result is emitted twice (with the same confidence
value), in that case you may want to debounce it or just process the first event, like this:
if (isFinal) {
transcript = result[0].transcript;
if(transcript == lastDebounceTranscript) {
return;
}
lastDebounceTranscript = transcript;
}
where lastDebounceTranscript
is a variable that you initialize outside of the scope of the event handler
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With