Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use google speech recognition api in python? [closed]

I have an mp3 file and I want to use Google's speech recognition to get the text out of that file. Any ideas where I can find documentation or examples will be appreciated.

like image 441
Vivek Anand Avatar asked Aug 01 '16 16:08

Vivek Anand


People also ask

Can Google speech API be used offline?

Speech recognition can be activated when typing on your Android device. If this facility is available in the app you are using, a microphone icon will appear on the keypad. Pressing this activates the speech recognition. Android does have offline speech recognition capabilities.

Is python speech recognition offline?

This requires an active internet connection to work. However, there are certain offline Recognition systems such as PocketSphinx, that have a very rigorous installation process that requires several dependencies. Google Speech Recognition is one of the easiest to use.


1 Answers

Take a look at Google Cloud Speech API that enables developers to convert audio to text [...] The API recognizes over 80 languages and variants [...] You can create a free account to get a limited amount of API request.

HOW TO:

You need first to install gcloud python module & google-api-python-client module with:

pip install --upgrade gcloud
pip install --upgrade google-api-python-client

Then in the Cloud Platform Console, go to the Projects page and select or create a new project. After you need to enable billing for your project, then enable Cloud Speech API.

After enabling the Google Cloud Speech API, click the Go to Credentials button to set up your Cloud Speech API credentials

See Set Up a Service Account for information on how to authorize to the Cloud Speech API service from your code

You should obtain both a service account key file (in JSON) and a GOOGLE_APPLICATION_CREDENTIALS environment variable that will allow you to authenticate to the Speech API

Once all done, download the audio raw file from google and also the speech-discovery_google_rest_v1.json from google

Modify previous downloaded JSON file to set your credentials key then make sure that you have set your the GOOGLE_APPLICATION_CREDENTIALS environment variable to the full path of the .json file with:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_file.json

also

Make sure that you have set your GCLOUD_PROJECT environment variable to the ID of your Google Cloud project with :

export GCLOUD_PROJECT=your-project-id

assuming all done, you can create a tutorial.py file which contain:

import argparse
import base64
import json

from googleapiclient import discovery
import httplib2
from oauth2client.client import GoogleCredentials


DISCOVERY_URL = ('https://{api}.googleapis.com/$discovery/rest?'
                 'version={apiVersion}')


def get_speech_service():
    credentials = GoogleCredentials.get_application_default().create_scoped(
        ['https://www.googleapis.com/auth/cloud-platform'])
    http = httplib2.Http()
    credentials.authorize(http)

    return discovery.build(
        'speech', 'v1beta1', http=http, discoveryServiceUrl=DISCOVERY_URL)


def main(speech_file):
    """Transcribe the given audio file.

    Args:
        speech_file: the name of the audio file.
    """
    with open(speech_file, 'rb') as speech:
        speech_content = base64.b64encode(speech.read())

    service = get_speech_service()
    service_request = service.speech().syncrecognize(
        body={
            'config': {
                'encoding': 'LINEAR16',  # raw 16-bit signed LE samples
                'sampleRate': 16000,  # 16 khz
                'languageCode': 'en-US',  # a BCP-47 language tag
            },
            'audio': {
                'content': speech_content.decode('UTF-8')
                }
            })
    response = service_request.execute()
    print(json.dumps(response))

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'speech_file', help='Full path of audio file to be recognized')
    args = parser.parse_args()
    main(args.speech_file)

Then run:

python tutorial.py audio.raw
like image 55
A STEFANI Avatar answered Sep 23 '22 04:09

A STEFANI