I keep getting this error <code>InvalidArgument: 400</code> in google Speech-to-text, and the problem seems to be that I an using a 2 channel audio(Stereo), and the API is waiting for a wav in (Mono). If I convert the file in a audio editor it might work, but I cannot use an audio editor to convert a batch of files. Is there a way to change the Audio type in either Python or Google Cloud. Note: I already tried with the "wave module" but I kept getting an error #7 for file type not recognize(I couldn't read the wav file with the module wave from Python) <blockquote> -ERROR- InvalidArgument: 400 Must use single channel (mono) audio, but WAV header indicates 2 channels. </blockquote>

Assuming you're using the google-cloud-speech library, you could use the <code>audio_channel_count</code> property in your <code>RecognitionConfig</code> and specify the number of channels in the input audio data (it defaults to one channel(mono)). You could do something like this: <pre class="prettyprint"><code>from google.cloud import speech client = speech.SpeechClient() results = client.recognize( audio = speech.types.RecognitionAudio( uri = 'gs://your-bucket/recording.wav', ), config = speech.types.RecognitionConfig( encoding = 'LINEAR16', language_code = 'en-US', sample_rate_hertz = 44100, audio_channel_count = 2, ), ) </code></pre> See the API doc for further info.

You should use the below function to dynamically return Audio Chanel & frame_rate it takes the audio file path and returns frame rate and number of Chanel <code>def frame_rate_channel(audio_file_name): print(audio_file_name) with wave.open(audio_file_name, "rb") as wave_file: frame_rate = wave_file.getframerate() channels = wave_file.getnchannels() return frame_rate,channels</code>

Google Speech-to-text API, InvalidArgument: 400 Must use single channel (mono)

2 Answers

Assuming you're using the google-cloud-speech library, you could use the audio_channel_count property in your RecognitionConfig and specify the number of channels in the input audio data (it defaults to one channel(mono)). You could do something like this:

from google.cloud import speech

client = speech.SpeechClient()
results = client.recognize(
    audio = speech.types.RecognitionAudio(
        uri = 'gs://your-bucket/recording.wav',
    ),
    config = speech.types.RecognitionConfig(
        encoding = 'LINEAR16',
        language_code = 'en-US',
        sample_rate_hertz = 44100,
        audio_channel_count = 2,
    ),
)

See the API doc for further info.

answered Sep 21 '22 13:09

LundinCast

You should use the below function to dynamically return Audio Chanel & frame_rate it takes the audio file path and returns frame rate and number of Chanel

def frame_rate_channel(audio_file_name): print(audio_file_name) with wave.open(audio_file_name, "rb") as wave_file: frame_rate = wave_file.getframerate() channels = wave_file.getnchannels() return frame_rate,channels

answered Sep 23 '22 13:09

syed irfan

Related questions
                            
                                How can I create a language independent library using Python?
                            
                                SQLAlchemy - Multiple Foreign key pointing to same table same attribute
                            
                                How to standardize data with sklearn's cross_val_score()
                            
                                What are the arguments for scipy.stats.uniform?
                            
                                pyodbc.connect() works, but not sqlalchemy.create_engine().connect()
                            
                                ALLOWED_HOSTS and Django
                            
                                Beautiful Soup Nested Tag Search
                            
                                Prevent setup.py test / pytest from installing extra dependencies
                            
                                Error installing psycopg2==2.6.2
                            
                                How to break on `pass` in pycharm
                            
                                How to convert a PDF from base64 string to a file?
                            
                                Anaconda - UnsatisfiableError: The following specifications were found to be in conflict
                            
                                How to change jupyter kernel from Python 2 to python 3?
                            
                                Airflow latency between tasks
                            
                                Visualize Gensim Word2vec Embeddings in Tensorboard Projector
                            
                                Pandas dataframe to dict of dict
                            
                                How do I properly document python enum elements? [duplicate]
                            
                                Split pandas dataframe into multiple dataframes based on null columns
                            
                                altair remove or suppress automatically generated plot legend
                            
                                How to get client secret via Keycloak API?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Google Speech-to-text API, InvalidArgument: 400 Must use single channel (mono)

Tags:

python

google-cloud-speech

Jose silvestre Rodriguez Ortiz

People also ask

2 Answers

LundinCast

syed irfan

Recent Activity

Donate For Us