Google Cloud Speech-to-Text (MP3 to text)

Question

I am using Google Cloud Platform Speech-to-Text API trial account service. I am not able to get text from an audio file. I do not know what exact encoding and sample Rate Hertz I should use for MP3 file of bit rate 128kbps. I tried various options but I am not getting the transcription.

const speech = require('@google-cloud/speech');

const config = {
  encoding: 'LINEAR16',  //AMR, AMR_WB, LINEAR16(for wav)
  sampleRateHertz: 16000,  //16000 giving blank result.
  languageCode: 'en-US'
};

Grokify · Accepted Answer

MP3 is now supported in beta:

MP3 Only available as beta. See RecognitionConfig reference for details.

https://cloud.google.com/speech-to-text/docs/encoding

MP3 MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sampleRateHertz can be optionally unset if not known.

https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#AudioEncoding

You can find out the sample rate using a variety of tools such as iTunes. CD-quality audio uses a sample rate of 44100 Hertz. Read more here:

https://en.wikipedia.org/wiki/44,100_Hz

To use this in a Google SDK, you may need to use one of the beta SDKs that defines this. Here is the constant from the Go Beta SDK:

RecognitionConfig_MP3 RecognitionConfig_AudioEncoding = 8

https://godoc.org/google.golang.org/genproto/googleapis/cloud/speech/v1p1beta1

Google Cloud Speech-to-Text (MP3 to text)

Tags:

mp3

speech-to-text

google-cloud-speech

Vikash Patel

1 Answers

Grokify

Recent Activity

Donate For Us

Google Cloud Speech-to-Text (MP3 to text)

Tags:

mp3

speech-to-text

google-cloud-speech

Vikash Patel

1 Answers

Grokify

Related questions

Recent Activity

Donate For Us