It's possible to use Google's Speech recognition API to get a transcription for an audio file (WAV, MP3, etc.) by doing a request to <code>http://www.google.com/speech-api/v2/recognize?...</code> Example: I have said "one two three for five" in a WAV file. Google API gives me this: <pre class="prettyprint"><code>{ u'alternative': [ {u'transcript': u'12345'}, {u'transcript': u'1 2 3 4 5'}, {u'transcript': u'one two three four five'} ], u'final': True } </code></pre> Question: is it possible to get the time (in seconds) at which each word has been said? With my example: <pre class="prettyprint"><code>['one', 0.23, 0.80], ['two', 1.03, 1.45], ['three', 1.79, 2.35], etc. </code></pre> i.e. the word "one" has been said between time 00:00:00.23 and 00:00:00.80, the word "two" has been said between time 00:00:01.03 and 00:00:01.45 (in seconds). PS: looking for an API supporting other languages than English, especially French.

EDIT 2020: Now possible, see the other answers It is not possible with google API. If you want word timestamps, you can use other APIs, for example: Vosk-API - free offline speech recognition API (disclosure: I am the primary author of Vosk). SpeechMatics SaaS speech recognition API Speech Recognition API from IBM

Google Speech Recognition API: timestamp for each word?

Tags:

audio

speech-recognition

speech

speech-to-text

google-speech-api

It's possible to use Google's Speech recognition API to get a transcription for an audio file (WAV, MP3, etc.) by doing a request to http://www.google.com/speech-api/v2/recognize?...

Example: I have said "one two three for five" in a WAV file. Google API gives me this:

{
  u'alternative':
  [
    {u'transcript': u'12345'},
    {u'transcript': u'1 2 3 4 5'},
    {u'transcript': u'one two three four five'}
  ],
  u'final': True
}

Question: is it possible to get the time (in seconds) at which each word has been said?

With my example:

['one', 0.23, 0.80], ['two', 1.03, 1.45], ['three', 1.79, 2.35], etc.

i.e. the word "one" has been said between time 00:00:00.23 and 00:00:00.80,
the word "two" has been said between time 00:00:01.03 and 00:00:01.45 (in seconds).

PS: looking for an API supporting other languages than English, especially French.

809

asked Dec 04 '15 10:12

Basj

2 Answers

I believe the other answer is now out of date. This is now possible with the Google Cloud Search API: https://cloud.google.com/speech/docs/async-time-offsets

answered Oct 06 '22 20:10

deweydb

EDIT 2020: Now possible, see the other answers

It is not possible with google API.

If you want word timestamps, you can use other APIs, for example:

Vosk-API - free offline speech recognition API (disclosure: I am the primary author of Vosk).

SpeechMatics SaaS speech recognition API

Speech Recognition API from IBM

answered Oct 06 '22 20:10

Nikolay Shmyrev

Related questions
                            
                                CMake seems to ignore CMAKE_OSX_DEPLOYMENT_TARGET
                            
                                How to change font size of the scientific notation in matplotlib?
                            
                                is a select option with no value, valid?
                            
                                Entity Framework 6 set connection string in code
                            
                                How to write summaries for multiple runs in Tensorflow
                            
                                Why do I need a HTTP-server to run Angular 2?
                            
                                Why forbidden to use a remote function inside a guard
                            
                                SQLAlchemy Core: order by desc
                            
                                How to show exact number of items in RecyclerView?
                            
                                How to prevent IIS from shutting down Web Site when not in use?
                            
                                Fix "unknown repository" of an opened PR after deleted the fork
                            
                                Python: calling 'list' on a map object twice

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With