I keep getting this error InvalidArgument: 400
in google Speech-to-text, and the problem seems to be that I an using a 2 channel audio(Stereo), and the API is waiting for a wav in (Mono).
If I convert the file in a audio editor it might work, but I cannot use an audio editor to convert a batch of files. Is there a way to change the Audio type in either Python or Google Cloud.
Note: I already tried with the "wave module" but I kept getting an error #7 for file type not recognize(I couldn't read the wav file with the module wave from Python)
-ERROR- InvalidArgument: 400 Must use single channel (mono) audio, but WAV header indicates 2 channels.
Text-to-Speech is priced based on the number of characters sent to the service to be synthesized into audio each month. You must enable billing to use Text-to-Speech, and will be automatically charged if your usage exceeds the number of free characters allowed per month.
Speech-to-Text can process up to 1 minute of speech audio data sent in a synchronous request. After Speech-to-Text processes and recognizes all of the audio, it returns a response. A synchronous request is blocking, meaning that Speech-to-Text must return a response before processing the next request.
Assuming you're using the google-cloud-speech library, you could use the audio_channel_count
property in your RecognitionConfig
and specify the number of channels in the input audio data (it defaults to one channel(mono)). You could do something like this:
from google.cloud import speech
client = speech.SpeechClient()
results = client.recognize(
audio = speech.types.RecognitionAudio(
uri = 'gs://your-bucket/recording.wav',
),
config = speech.types.RecognitionConfig(
encoding = 'LINEAR16',
language_code = 'en-US',
sample_rate_hertz = 44100,
audio_channel_count = 2,
),
)
See the API doc for further info.
You should use the below function to dynamically return Audio Chanel & frame_rate it takes the audio file path and returns frame rate and number of Chanel
def frame_rate_channel(audio_file_name):
print(audio_file_name)
with wave.open(audio_file_name, "rb") as wave_file:
frame_rate = wave_file.getframerate()
channels = wave_file.getnchannels()
return frame_rate,channels
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With