Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Realtime offline speech recognition in Python

I've been working with Python speech recognition for the better part of a month now, making a JARVIS-like assistant. I've used both the Speech Recognition module with Google Speech API and Pocketsphinx, and I've used Pocketsphinx directly without another module. While the recognition is accurate, I've had a hard time working with the large amount of time these packages take to process speech. The way they seem to work is by recording from one point of silence to another, and then passing the recording to the STT engine. While the recording is being processed, no other sound can be recorded for recognition, which can be a problem if I'm trying to issue multiple complex commands in series.

When looking at the Google Assistant voice recognition, Alexa's voice recognition, or Mac OS High Sierra's offline recognition, I see words being recognized as I say them without any pause in the recording. I've seen this called realtime recognition, streaming recognition, and word-by-word recognition. Is there any way to do this in Python, preferably offline without using a client?

I tried (unsuccessfully) to accomplish this by changing pause threshold, speaking threshold, and non-speaking threshold for the SpeechRecognition recognizer, but that just caused the audio to segment strangely and still needed a second after each recognition before it could record again.

like image 829
Elias N-d Avatar asked Jul 25 '18 18:07

Elias N-d


People also ask

Does python speech recognition require Internet?

to recognize our speech, however recognise_google() doesn't work without internet connection.


2 Answers

Pocketsphinx can process streams, see here

Python pocketsphinx recognition from the microphone

Kaldi can process streams too (more accurate than pocketsphinx)

https://github.com/alphacep/kaldi-websocket-python/blob/master/test_local.py

Google speech API can also process streams, see here:

Google Streaming Speech Recognition on an Audio Stream Python

like image 114
Nikolay Shmyrev Avatar answered Oct 27 '22 16:10

Nikolay Shmyrev


First of all, there is a python library called, VOSK. to install it on your computer type this command

pip3 install vosk

for more details please visit:

https://alphacephei.com/vosk/install

now we have to download the model for that go to this website and choose your preferred model and download it:

https://alphacephei.com/vosk/models here I use " vosk-model-small-en-us-0.15 " as my model

after download, you can see it is a compressed file unzip it in your root folder, like this

speech-recognition/
    ├─ vosk-model-small-en-us-0.15 ( Unzip follder ) 
    ├─ offline-speech-recognition.py ( python file )

here is the full code :

    from vosk import Model, KaldiRecognizer
    import pyaudio
    
    model = Model(r"C:\\Users\User\Desktop\python practice\ai\vosk-model-small-en-us-0.15")
    recognizer = KaldiRecognizer(model, 16000)
    
    mic = pyaudio.PyAudio()
    stream = mic.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
    stream.start_stream()
    
    while True:
        data = stream.read(4096)
        
    
        if recognizer.AcceptWaveform(data):
            text = recognizer.Result()
            print(f"' {text[14:-3]} '")

for more detail you can read this article I've written : https://buddhi-ashen-dev.vercel.app/posts/offline-speech-recognition

like image 24
Buddhi ashen Avatar answered Oct 27 '22 15:10

Buddhi ashen