Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issues with Web Speech API in Android Chrome

I'm trying to make use of the SpeechRecognition interface of the Web Speech API. It works fine on the desktop version of Chrome but I can't get it to detect any audio on the Android version. After failing to get my own code to work I tested this demo as well as this other demo on two different Android devices (one running LineageOS Nougat, one running LineageOS Pie, both with Chrome 79) but neither demo worked on either device.

I'm not sure what's wrong here... can anyone else get these demos working on Android? I am serving my test page over https and I can record audio from the microhpone on these devices just fine using navigator.mediaDevices.getUserMedia so it doesn't seem to be a hardware, permission, or security issue.

The specific symptoms I'm seeing are as follows:

  • The start event fires after initially starting the recognition as expected but the subsequent audiostart,soundstart, speechstart and result events which should follow it never do.

  • Attempting to call SpeechRecognition.stop seems to have no effect — the end event does not get fired. Calling SpeechRecognition.start after a stop attempt throws Uncaught DOMException: Failed to execute 'start' on 'SpeechRecognition': recognition has already started.

  • Calling SpeechRecognition.abort does fire the end event and allows the recognition to be restarted.

Here's some test code based on the example from MDN.

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title> Web Speech API Test </title>
    <style>
      * { box-sizing: border-box; }

      html {
        height: 100%;
        width: 100%;
      }

      body {
        height: 100%;
        width: 100%;
        padding: 0;
        margin: 0;
        display: grid;
        grid-template-columns: 1fr;
        grid-template-rows: 1fr 10fr 1fr;
        font-family: sans-serif;
      }

      h1 {
        margin: 0;
        padding: 0.5rem;
        background-color: dodgerblue;
        text-align: center;
      }

      #output {
        margin: 0;
        padding: 0.5em;
        border: 0;
        background-color: transparent;
      }

      #start {
        display: block;
        background-color: dodgerblue;
        border: 0;
        color: navy;
        font-weight: bold;
        font-size: 1.2em;
      }
    </style>
  </head>
  <body>
    <h1> Web Speech API Test </h1>
    <textarea id="output"></textarea>
    <button id="start"> START </button>
    <script>
      let SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
      let SpeechGrammarList = window.SpeechGrammarList || window.webkitSpeechGrammarList;
      let SpeechRecognitionEvent = window.SpeechRecognitionEvent || window.webkitSpeechRecognitionEvent;

      let grammar = '#JSGF V1.0; grammar colors; public <color> = aqua | azure | beige | bisque | black | blue | brown | chocolate | coral | crimson | cyan | fuchsia | ghostwhite | gold | goldenrod | gray | green | indigo | ivory | khaki | lavender | lime | linen | magenta | maroon | moccasin | navy | olive | orange | orchid | peru | pink | plum | purple | red | salmon | sienna | silver | snow | tan | teal | thistle | tomato | turquoise | violet | white | yellow ;';

      let recognition = new SpeechRecognition();
      let speechRecognitionList = new SpeechGrammarList();
      speechRecognitionList.addFromString(grammar, 1);

      recognition.grammars = speechRecognitionList;
      recognition.continuous = false;
      recognition.lang = 'en-US';
      recognition.interimResults = false;
      recognition.maxAlternatives = 1;

      let startButton = document.getElementById('start');
      let output = document.getElementById('output');
      output.value += 'Initializing...';

      let listening = false;

      startButton.addEventListener('click', event => {
        if (listening == false) {
          recognition.start();
          startButton.innerHTML = 'STOP';
          listening = true;
        } else {
      //    recognition.stop();
          recognition.abort();
        }
      });

      console.dir(recognition);
      output.value += 'ready.';

      recognition.onstart = event => {
        output.value += '\nRecognition started';
      };

      recognition.onaudiostart = event => {
        output.value += '\nAudio started';
      };

      recognition.onsoundstart = event => {
        output.value += '\nSound started';
      };

      recognition.onspeechstart = event => {
        output.value += '\nSpeech started';
      };

      recognition.onspeechend = event => {
        output.value += '\nSpeech ended';
        recognition.stop();
      };

      recognition.onsoundend = event => {
        output.value += '\nSound ended';
      };

      recognition.onaudioend = event => {
        output.value += '\nAudio ended';
      };

      recognition.onend = event => {
        output.value += '\nRecognition stopped';
        startButton.innerHTML = 'START';
        listening = false;
      };

      recognition.onresult = event => {
        let color = event.results[0][0].transcript;
        let confidence = event.results[0][0].confidence;
        document.body.style.backgroundColor = color;
        output.value += '\nResult recieved: ' + color;
        output.value += '\nConfidence: ' + confidence;
      };

      recognition.onnomatch = event => {
        output.value += '\nColor not recognised';
      };

      recognition.onerror = event => {
        output.value += '\nERROR: ' + event.error;
      };
    </script>
  </body>
</html>

Any ideas as to what the problem could be would be appreciated.

UPDATE 2021-01-08:

I modified the example code so that it outputs log messages to a textarea element instead of the console in order to eliminate the need for remote debugging. I also published a live version on my domain. I then tested it using Chrome Canary 89 on LineageOS Oreo and found that it still did not work there. However, I then found that this example DOES work perfectly on a Razer Phone running it's official version of Android Pie and Chrome 87! So it would seem that my WebSpeech implementation is fine and possibly there is some other issue with LineageOS that has existed for multiple versions.

This question has recieved a fair number of views so I imagine others must be having similar issues. To those people, I suggest you try the live test on few different devices and report your findings back here. Maybe we can narrow down the conditions that are causing it to fail on some devices but not others. Possibly this has nothing to do with LineageOS at all but is another issue altogether.

like image 669
Besworks Avatar asked Nov 06 '22 09:11

Besworks


1 Answers

The Web Speech API on Android uses a third-party service that is usually implemented by Google (Play Services) and/or the manufacturer (e.g. Samsung). Most likely this service is missing or disabled in LineageOS since it usually connects to a cloud server for transcription.

like image 68
Flow Avatar answered Nov 14 '22 23:11

Flow