Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to get results from Google text to speech api while streaming audio from web

I want to stream audio from the web and convert that to text using Python Google-cloud-speech API. I have integrated that in my Django channels code.

For frontend, I have directly copied this code and the backend has this code (please see below). Now, coming to the problem, I am not getting any exceptions or errors but I was not getting any results from google API.

What I tried:

  • I put debug points inside for loop of process function, the control never reaches inside the loop.

  • I have gone through the java code here and tried to understand that. I have a setup that java code in my local and debugged it. One thing I understood is in java code, the method onWebSocketBinary is receiving an integer array, from frontend we are sending that like this.

      socket.send(Int16Array.from(floatSamples.map(function (n) {return n * MAX_INT;}))); 
  • In java, they are converting into bytestring then sending it to Google. Whereas in Django, I put debug points and noticed that I am getting data in a binary string. So, I felt I don't need to do anything with that. but, I tried few several ways by converting that to integer array, but that didn't work because google is expecting in bytes itself (you can see the commented code below).

  • I went through this example code and this from Google and I am doing the same thing, I didn't understand what I am doing it wrong here.

Django Code:

import json  from channels.generic.websocket import WebsocketConsumer  # Imports the Google Cloud client library from google.cloud import speech from google.cloud.speech import enums from google.cloud.speech import types  # Instantiates a client client = speech.SpeechClient() language_code = "en-US" streaming_config = None   class SpeechToTextConsumer(WebsocketConsumer):     def connect(self):         self.accept()      def disconnect(self, close_code):         pass      def process(self, streaming_recognize_response: types.StreamingRecognitionResult):         for response in streaming_recognize_response:             if not response.results:                 continue             result = response.results[0]             self.send(text_data=json.dumps(result))      def receive(self, text_data=None, bytes_data=None):         global streaming_config         if text_data:             data = json.loads(text_data)             rate = data["sampleRate"]             config = types.RecognitionConfig(                 encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,                 sample_rate_hertz=rate,                 language_code=language_code,             )             streaming_config = types.StreamingRecognitionConfig(                 config=config, interim_results=True, single_utterance=False             )             types.StreamingRecognizeRequest(streaming_config=streaming_config)             self.send(text_data=json.dumps({"message": "processing..."}))         if bytes_data:             # bytes_data = bytes_data[math.floor(len(bytes_data) / 2) :]             # bytes_data = bytes_data.lstrip(b"\x00")             # bytes_data = int.from_bytes(bytes_data, "little")             stream = [bytes_data]             requests = (                 types.StreamingRecognizeRequest(audio_content=chunk) for chunk in stream             )             responses = client.streaming_recognize(streaming_config, requests)             self.process(responses) 
like image 503
Lokesh Sanapalli Avatar asked May 09 '19 17:05

Lokesh Sanapalli


People also ask

How do I integrate Text-to-Speech on my website?

Open the Google website on your desktop computer and you'll find a little microphone icon embedded inside the search box. Click the icon, say something and your voice is quickly transcribed into words.

How do I use Google Text to Speech API for free?

Try using this URL: http://translate.google.com/translate_tts?tl=en&q=Hello%20World It will automatically generate a wav file which you can easily get with an HTTP request through any . net programming.

Can Google Speech API be used offline?

Android does have offline speech recognition capabilities. You can activate this by going to Settings - Language and Input - Voice Input and touch the cog icon next to Enhanced Google Services.


1 Answers

I ran into a similar issue while creating a virtual artificially intelligent assistant, and believe that I could offer at least a bit of help. I am in no way an expert, but I did find a way to implement Google's text-to-speech engine. I used python's speech_recognition library (you can download with pip install speech_recognition) and importing it as "sr". from here you set up Google's API with the recognize.recognize_google(audio file). You do not need an account as this library includes a key already and is super easy to set up and implement wherever, (such as Django). Here is a really helpful link to a tutorial on this that I really recommend. Here is a link to the documentation. Here is a helpful program that takes an audio file and transcribes it using all of the available speech recognition services. This is the code below, you can use whichever service you like, sphinx runs offline, and google's API doesn't require signup because it already has a key and password.

    #!/usr/bin/env python3  import speech_recognition as sr  # obtain path to "english.wav" in the same folder as this script from os import path AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav") # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff") # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")  # use the audio file as the audio source r = sr.Recognizer() with sr.AudioFile(AUDIO_FILE) as source:     audio = r.record(source)  # read the entire audio file  # recognize speech using Sphinx try:     print("Sphinx thinks you said " + r.recognize_sphinx(audio)) except sr.UnknownValueError:     print("Sphinx could not understand audio") except sr.RequestError as e:     print("Sphinx error; {0}".format(e))  # recognize speech using Google Speech Recognition try:     # for testing purposes, we're just using the default API key     # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`     # instead of `r.recognize_google(audio)`     print("Google Speech Recognition thinks you said " + r.recognize_google(audio)) except sr.UnknownValueError:     print("Google Speech Recognition could not understand audio") except sr.RequestError as e:     print("Could not request results from Google Speech Recognition service; {0}".format(e))  # recognize speech using Google Cloud Speech GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE""" try:     print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)) except sr.UnknownValueError:     print("Google Cloud Speech could not understand audio") except sr.RequestError as e:     print("Could not request results from Google Cloud Speech service; {0}".format(e))  # recognize speech using Wit.ai WIT_AI_KEY = "INSERT WIT.AI API KEY HERE"  # Wit.ai keys are 32-character uppercase alphanumeric strings try:     print("Wit.ai thinks you said " + r.recognize_wit(audio, key=WIT_AI_KEY)) except sr.UnknownValueError:     print("Wit.ai could not understand audio") except sr.RequestError as e:     print("Could not request results from Wit.ai service; {0}".format(e))  # recognize speech using Microsoft Azure Speech AZURE_SPEECH_KEY = "INSERT AZURE SPEECH API KEY HERE"  # Microsoft Speech API keys 32-character lowercase hexadecimal strings try:     print("Microsoft Azure Speech thinks you said " + r.recognize_azure(audio, key=AZURE_SPEECH_KEY)) except sr.UnknownValueError:     print("Microsoft Azure Speech could not understand audio") except sr.RequestError as e:     print("Could not request results from Microsoft Azure Speech service; {0}".format(e))  # recognize speech using Microsoft Bing Voice Recognition BING_KEY = "INSERT BING API KEY HERE"  # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings try:     print("Microsoft Bing Voice Recognition thinks you said " + r.recognize_bing(audio, key=BING_KEY)) except sr.UnknownValueError:     print("Microsoft Bing Voice Recognition could not understand audio") except sr.RequestError as e:     print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))  # recognize speech using Houndify HOUNDIFY_CLIENT_ID = "INSERT HOUNDIFY CLIENT ID HERE"  # Houndify client IDs are Base64-encoded strings HOUNDIFY_CLIENT_KEY = "INSERT HOUNDIFY CLIENT KEY HERE"  # Houndify client keys are Base64-encoded strings try:     print("Houndify thinks you said " + r.recognize_houndify(audio, client_id=HOUNDIFY_CLIENT_ID, client_key=HOUNDIFY_CLIENT_KEY)) except sr.UnknownValueError:     print("Houndify could not understand audio") except sr.RequestError as e:     print("Could not request results from Houndify service; {0}".format(e))  # recognize speech using IBM Speech to Text IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE"  # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE"  # IBM Speech to Text passwords are mixed-case alphanumeric strings try:     print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD)) except sr.UnknownValueError:     print("IBM Speech to Text could not understand audio") except sr.RequestError as e:     print("Could not request results from IBM Speech to Text service; {0}".format(e)) 

Hope this helped in some way!

like image 144
Mason Choi Avatar answered Sep 20 '22 16:09

Mason Choi