This is the Google Speech API docs: https://cloud.google.com/speech/docs/sync-recognize
I trried this API for 2 weeks, but still can't solve my main purpose (translate live streaming).
I'm using PHP. (other language suggestion is allowed, I will find by myself)
What I can do in my 2 weeks:
Synchronous Speech Recognition (<=1min)
Asynchronous Speech Recognition (>1min and <=80min). Note: i can modify this to accept 3hours video.
Live speech recognition from mic : https://www.google.com/intl/en/chrome/demos/speech.html
UPDATE: Perform streaming API with audio less than 6sec duration.
What can't I do is:
How to translate live streaming. ex: radio streaming (delay is allowed)
How to Translate when video/audio playing. (delay is allowed)
UPDATE:
I also ask the question on google github too. but since no answer, i ask here.
Summary:
I can perform speech streaming but only with 6 second audio. This is not like what i expected. My expectation is to recognize unlimited duration (seems we dont know when radio streaming will end).
Thank for any help. i very appreciate it
UPDATE:
To approve that I can't use video longer than 6sec. so i write this:
I try this video interview.mp4 and convert it with ffmpeg to interview.flac using this ffmpeg -i interview.mp4 -c:a flac -ar 16000 -ac 1 -sample_fmt s16 interview.flac
.
i use this library to transcribe the video using this command:
php speech.php transcribe --encoding FLAC --language-code en-US --sample-rate 16000 --stream interview.flac
and the result is:
[Google\GAX\ApiException]
Invalid 'audio_content': too long.
it cant be too long, because the video duration is only 48 sec. this is the meta from ffmpeg result:
Output #0, flac, to 'interview.flac':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.72.101
Stream #0:0(und): Audio: flac, 16000 Hz, mono, s16, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
encoder : Lavc57.92.100 flac
size= 810kB time=00:00:48.01 bitrate= 138.1kbits/s speed= 108x
video:0kB audio:801kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.019650%
You need to use the StreamingRecognize
API call. You can find an example of doing that in PHP here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With