I am working on Google cloud Speech-to-text samples. I took a sample from from this link GoogleCloudPlatform speech to text sample And I referred Quickstart: Using Client Libraries Sample files given in that example works fine. It gives text of that audio file. But If I give my own audio file, it does not returns anything.
Cloud request includes audio file, AudioEncoding and SampleRateHertz. Issue may be in AudioEncoding and SampleRateHertz of my own audio file.
How to identify AudioEncoding and SampleRateHertz of an audio file?
AudioEncoding
's Java enum has the following possible values:
AudioEncoding.AMR
-> .awb/.3gp files
AudioEncoding.AMR_WB
-> .awb/.3gp files
AudioEncoding.FLAC
-> .flac files
AudioEncoding.LINEAR16
-> .wav files
AudioEncoding.MULAW
-> .wav files
AudioEncoding.OGG_OPUS
-> .ogg/.opus files
AudioEncoding.SPEEX_WITH_HEADER_BYTE
-> no clue, maybe .speex
So you could make a first guess by the file extension, for the SampleRateHertz
you could use a tool like Tika by Apache. This outputs for the commercial_stereo.wav the following:
Content-Length: 6305632
Content-Type: audio/vnd.wave
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.audio.AudioParser
X-TIKA:digest:MD5: 7e3e8837273e8bb143533894926f7da3
X-TIKA:digest:SHA256: 98fac004fb662ad8f720e680c81e3b4c9dea20190f5d1d908cece2cd6b30f01e
bits: 16
channels: 2
encoding: PCM_SIGNED
resourceName: commercial_stereo.wav
samplerate: 44100.0
xmpDM:audioSampleRate: 44100
xmpDM:audioSampleType: 16Int
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With