Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement Mozilla DeepSpeech into PHP web app to convert Speech-to-text?

I have a PHP web application and am looking for an open source, high-accuracy speech-to-text recognition implementation that will take voice commands to open web pages from users. Examples: "Make Sales" (this will open Create Sales PHP page), "Make Purchase order", "Open END-OF-DAY reports", etc.

My Question :

I want to know if we can we use Mozilla DeepSpeech to take .wav audio from a Firefox browser and return speech to text. If yes, what will be the flow from recording voice from Firefox using mic TO convert text using the DeepSpeech engine?

How to make wakeup/launch call similar to OK-GOOGLE that will be ready to listen for commands?

like image 221
Priyesh Avatar asked May 29 '18 10:05

Priyesh


People also ask

What is Mozilla DeepSpeech?

The Machine Learning team at Mozilla continues work on DeepSpeech, an automatic speech recognition (ASR) engine which aims to make speech recognition technology and trained models openly available to developers. DeepSpeech is a deep learning-based ASR engine with a simple API.

How good is DeepSpeech?

DeepSpeech is quite a quality piece of software and has delivered excellent speech-to-text results for translating audio into accurate text. I've personally experimented with it a lot as part of DeepSpeech benchmarking in evaluating its CPU performance.


1 Answers

You can achieve that by creating a server and sending requests back and forth using assinchronious requests/AJAX or web sockets.

You can find Server installation instructions using the link below:

https://pypi.org/project/deepspeech-server/

After you have installed the server you can start making requests from any browser that supports "WebRTC API: getUserMedia()". Generate audio Blob data and send it in base64 format to the backend server. On the backend, save the blob to a temporary audio file:

$encodedData = base64_decode($data); 

// write the data out to the file
$fp = fopen($full_file_path, 'wb');
      fwrite($fp, $encodedData);
      fclose($fp);

Then convert audio file to text by making CURL request to your own Mozzila DeepSpeech Node.js server:

curl -X POST --data-binary @testfile.wav http://localhost:8080/stt

Create methods on your backend to loop through generated text and try to identify keywords/commands. If triggered send it back to the front end. Perhaps you just want to grant users ability to write long messages using their speech? - Return the whole text back - every time. You do however still want to "listen" to the keywords, in order to give users ability to set punctuation, start and finish writing.

Happy coding everyone ;)

like image 155
SergeDirect Avatar answered Oct 20 '22 00:10

SergeDirect