Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

400 Specify MP3 encoding to match audio file

I am trying to use the google-speech2text api however, I keep getting "Specify MP3 encoding to match audio file" even though I have setup my code to go through all available encoders.

This is the file I am trying to use

I have to add, If I upload the file on their UI I can get an output. So I assume there is nothing wrong in the source file.

from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')

speech_file = 'chunk7.mp3'

import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types


with io.open(speech_file, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

import wave

ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, 
            enums.RecognitionConfig.AudioEncoding.FLAC,
            enums.RecognitionConfig.AudioEncoding.MULAW,
            enums.RecognitionConfig.AudioEncoding.AMR,
            enums.RecognitionConfig.AudioEncoding.AMR_WB,
            enums.RecognitionConfig.AudioEncoding.OGG_OPUS, 
            enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]

SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
    for rate in SAMPLE_RATE_HERTZ:
        config = types.RecognitionConfig(
            encoding=enco,
            sample_rate_hertz=rate,
            language_code='en-US')

        # Detects speech in the audio file
        response = []

        print(response)
        try:
            response = client.recognize(config, audio)
            print(response)
        except:
            pass
        print("-----------------------------------------------------")
        print(str(rate) + "   " + str(enco))
        print("response: ", str(response))

Alternatively, there is another file here in Persian ('fa-IR') - which I face the similar issue. I initially put the Obama file as it is more understandable. I appreciate if test your answer with the second file as well.

like image 575
Areza Avatar asked Aug 14 '19 20:08

Areza


2 Answers

It seems like you're setting encoding equal to all the possible attributes the API offers. I found that:

encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED

Works for mp3 files. So try this:

from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
speech_file = 'chunk7.mp3'


def sample_recognize(local_file_path):
    """
    Transcribe a short audio file using synchronous speech recognition

    Args:
      local_file_path Path to local audio file, e.g. /path/audio.wav
    """

    client = speech_v1.SpeechClient()

    # local_file_path = 'resources/brooklyn_bridge.raw'

    # The language of the supplied audio
    language_code = "en-US"

    # Sample rate in Hertz of the audio data sent
    sample_rate_hertz = 16000   
    # If this fails try sample_rate_hertz = [8000, 12000, 16000, 24000, 48000]


    # Encoding of audio data sent. This sample sets this explicitly.
    # This field is optional for FLAC and WAV audio formats.
    encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
    config = {
        "language_code": language_code,
        "sample_rate_hertz": sample_rate_hertz,
        "encoding": encoding,
    }
    with io.open(local_file_path, "rb") as f:
        content = f.read()
    audio = {"content": content}

    response = client.recognize(config, audio)
    for result in response.results:
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))

sample_recognize(speech_file)

The above code is a slightly modified sample from the speech-to-text docs. If that doesn't work try looking deeper into encoding docs and best practices. Good luck.

like image 196
Joe Martinez Avatar answered Oct 17 '22 04:10

Joe Martinez


It looks like you got some unsupported audio format, make it easy just by converting to other format(flac advised), you got two options:

  • Search in google for a online audio convertion
  • Convert it yourself in you machine:

    1) Install sox (editing)

    2) Install encoders need it:

     * [lame](http://lame.sourceforge.net) mp3 encoder
     * [flac](https://xiph.org/flac/download.html) flac encoder
    

    3) run command:

    sox source.mp3 --channels=1 --bits=16 dest.flac

In which case you can also use python to execute command:

import subprocess
subprocess.check_output(['sox',sourcePath,'--channels=1','--bits=16',destPath]) 

Notice you don't need to specify neither sample_rate_hertz nor encoding just because all that info it's in flac headers itself, so you can omit them:

config = types.RecognitionConfig(language_code="fa-IR")
esponse = client.recognize(config, audio)

Resources: troubleshooting

like image 1
John Balvin Arias Avatar answered Oct 17 '22 04:10

John Balvin Arias