Unable to get results from Google text to speech api while streaming audio from web

Tags:

I want to stream audio from the web and convert that to text using Python Google-cloud-speech API. I have integrated that in my Django channels code.

For frontend, I have directly copied this code and the backend has this code (please see below). Now, coming to the problem, I am not getting any exceptions or errors but I was not getting any results from google API.

What I tried:

I put debug points inside for loop of process function, the control never reaches inside the loop.
I have gone through the java code here and tried to understand that. I have a setup that java code in my local and debugged it. One thing I understood is in java code, the method onWebSocketBinary is receiving an integer array, from frontend we are sending that like this.
```
  socket.send(Int16Array.from(floatSamples.map(function (n) {return n * MAX_INT;}))); 
```
In java, they are converting into bytestring then sending it to Google. Whereas in Django, I put debug points and noticed that I am getting data in a binary string. So, I felt I don't need to do anything with that. but, I tried few several ways by converting that to integer array, but that didn't work because google is expecting in bytes itself (you can see the commented code below).
I went through this example code and this from Google and I am doing the same thing, I didn't understand what I am doing it wrong here.

Django Code:

import json  from channels.generic.websocket import WebsocketConsumer  # Imports the Google Cloud client library from google.cloud import speech from google.cloud.speech import enums from google.cloud.speech import types  # Instantiates a client client = speech.SpeechClient() language_code = "en-US" streaming_config = None   class SpeechToTextConsumer(WebsocketConsumer):     def connect(self):         self.accept()      def disconnect(self, close_code):         pass      def process(self, streaming_recognize_response: types.StreamingRecognitionResult):         for response in streaming_recognize_response:             if not response.results:                 continue             result = response.results[0]             self.send(text_data=json.dumps(result))      def receive(self, text_data=None, bytes_data=None):         global streaming_config         if text_data:             data = json.loads(text_data)             rate = data["sampleRate"]             config = types.RecognitionConfig(                 encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,                 sample_rate_hertz=rate,                 language_code=language_code,             )             streaming_config = types.StreamingRecognitionConfig(                 config=config, interim_results=True, single_utterance=False             )             types.StreamingRecognizeRequest(streaming_config=streaming_config)             self.send(text_data=json.dumps({"message": "processing..."}))         if bytes_data:             # bytes_data = bytes_data[math.floor(len(bytes_data) / 2) :]             # bytes_data = bytes_data.lstrip(b"\x00")             # bytes_data = int.from_bytes(bytes_data, "little")             stream = [bytes_data]             requests = (                 types.StreamingRecognizeRequest(audio_content=chunk) for chunk in stream             )             responses = client.streaming_recognize(streaming_config, requests)             self.process(responses)

503

asked May 09 '19 17:05

Lokesh Sanapalli

1 Answers

I ran into a similar issue while creating a virtual artificially intelligent assistant, and believe that I could offer at least a bit of help. I am in no way an expert, but I did find a way to implement Google's text-to-speech engine. I used python's speech_recognition library (you can download with pip install speech_recognition) and importing it as "sr". from here you set up Google's API with the recognize.recognize_google(audio file). You do not need an account as this library includes a key already and is super easy to set up and implement wherever, (such as Django). Here is a really helpful link to a tutorial on this that I really recommend. Here is a link to the documentation. Here is a helpful program that takes an audio file and transcribes it using all of the available speech recognition services. This is the code below, you can use whichever service you like, sphinx runs offline, and google's API doesn't require signup because it already has a key and password.

    #!/usr/bin/env python3  import speech_recognition as sr  # obtain path to "english.wav" in the same folder as this script from os import path AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav") # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff") # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")  # use the audio file as the audio source r = sr.Recognizer() with sr.AudioFile(AUDIO_FILE) as source:     audio = r.record(source)  # read the entire audio file  # recognize speech using Sphinx try:     print("Sphinx thinks you said " + r.recognize_sphinx(audio)) except sr.UnknownValueError:     print("Sphinx could not understand audio") except sr.RequestError as e:     print("Sphinx error; {0}".format(e))  # recognize speech using Google Speech Recognition try:     # for testing purposes, we're just using the default API key     # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`     # instead of `r.recognize_google(audio)`     print("Google Speech Recognition thinks you said " + r.recognize_google(audio)) except sr.UnknownValueError:     print("Google Speech Recognition could not understand audio") except sr.RequestError as e:     print("Could not request results from Google Speech Recognition service; {0}".format(e))  # recognize speech using Google Cloud Speech GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE""" try:     print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)) except sr.UnknownValueError:     print("Google Cloud Speech could not understand audio") except sr.RequestError as e:     print("Could not request results from Google Cloud Speech service; {0}".format(e))  # recognize speech using Wit.ai WIT_AI_KEY = "INSERT WIT.AI API KEY HERE"  # Wit.ai keys are 32-character uppercase alphanumeric strings try:     print("Wit.ai thinks you said " + r.recognize_wit(audio, key=WIT_AI_KEY)) except sr.UnknownValueError:     print("Wit.ai could not understand audio") except sr.RequestError as e:     print("Could not request results from Wit.ai service; {0}".format(e))  # recognize speech using Microsoft Azure Speech AZURE_SPEECH_KEY = "INSERT AZURE SPEECH API KEY HERE"  # Microsoft Speech API keys 32-character lowercase hexadecimal strings try:     print("Microsoft Azure Speech thinks you said " + r.recognize_azure(audio, key=AZURE_SPEECH_KEY)) except sr.UnknownValueError:     print("Microsoft Azure Speech could not understand audio") except sr.RequestError as e:     print("Could not request results from Microsoft Azure Speech service; {0}".format(e))  # recognize speech using Microsoft Bing Voice Recognition BING_KEY = "INSERT BING API KEY HERE"  # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings try:     print("Microsoft Bing Voice Recognition thinks you said " + r.recognize_bing(audio, key=BING_KEY)) except sr.UnknownValueError:     print("Microsoft Bing Voice Recognition could not understand audio") except sr.RequestError as e:     print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))  # recognize speech using Houndify HOUNDIFY_CLIENT_ID = "INSERT HOUNDIFY CLIENT ID HERE"  # Houndify client IDs are Base64-encoded strings HOUNDIFY_CLIENT_KEY = "INSERT HOUNDIFY CLIENT KEY HERE"  # Houndify client keys are Base64-encoded strings try:     print("Houndify thinks you said " + r.recognize_houndify(audio, client_id=HOUNDIFY_CLIENT_ID, client_key=HOUNDIFY_CLIENT_KEY)) except sr.UnknownValueError:     print("Houndify could not understand audio") except sr.RequestError as e:     print("Could not request results from Houndify service; {0}".format(e))  # recognize speech using IBM Speech to Text IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE"  # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE"  # IBM Speech to Text passwords are mixed-case alphanumeric strings try:     print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD)) except sr.UnknownValueError:     print("IBM Speech to Text could not understand audio") except sr.RequestError as e:     print("Could not request results from IBM Speech to Text service; {0}".format(e))

Hope this helped in some way!

144

answered Sep 20 '22 16:09

Mason Choi

Related questions
                            
                                Disconnect signals for models and reconnect in django
                            
                                How can I make a trailing slash optional on a Django Rest Framework SimpleRouter
                            
                                Python/Django: Creating a simpler list from values_list()
                            
                                How do I test if a certain log message is logged in a Django test case?
                            
                                How to get URL parameters in a Django view?
                            
                                Access ForeignKey set directly in template in Django
                            
                                Django 1.8 sending mail using gmail SMTP
                            
                                How to use schemas in Django?
                            
                                where does django install in ubuntu
                            
                                Always including the user in the django template context
                            
                                how to verify if object exist in manytomany
                            
                                Unresolved reference: 'django' error in PyCharm
                            
                                Mongoengine creation_time attribute in Document
                            
                                Django templates: create a "back" link?
                            
                                Django App Improperly Configured - The app module has multiple filesystem locations
                            
                                Command not found: django-admin.py
                            
                                cache_page with Class Based Views
                            
                                Remove "add another" in Django admin screen
                            
                                Django: How can I call a view function from template?
                            
                                Best way to get query string from a URL in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unable to get results from Google text to speech api while streaming audio from web

Tags:

django

speech-to-text

google-speech-api

google-cloud-speech

django-channels

Lokesh Sanapalli

People also ask

1 Answers

Mason Choi

Recent Activity

Donate For Us