I started using google speech api to transcribe audio.
The audio being transcribed contains many numbers spoken one after the other.
E.g. 273 298
But the transcription comes back 270-3298
My guess is that it is interpreting it as some sort of phone number.
What i want is unparsed output e.g. "two seventy three two ninety eight' which i can deal with and parse on my own.
Is there a setting or support for this kind of thing?
thanks
So I had this exact same problem and I think we found a solution. If you're using English as input, switch to en-PH just when working with numbers. Google will then not format the result as a U.S. phone number or try to stick an extra digit in there.
Try passing a speech context with some phrase hints. How to use it is documented here: https://cloud.google.com/speech/docs/basics#phrase-hints
Give it the spelled out numbers that you want recognized.
"speech_context": {
"phrases":["zero", "one", "two", ... "nine", "ten", "eleven", ... "twenty", "thirty,..., "ninety"]
}
This isn't guaranteed to work, but it may help.
For the record, I tried blambert's solution above and it doesn't work, unfortunately. I posted another question recently seeing if anyone has found a way to defeat this behavior, as it is preventing me from implementing a transcription service that I had planned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With