Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Speech recognition produces bad results

I'm trying to get my speech recognition script working but it can't understand me.

import pyaudio
import speech_recognition as sr

def initSpeech():
    r = sr.Recognizer()

    with sr.Microphone() as source:
        r.adjust_for_ambient_noise(source, duration=2)
        print("Set minimum energy threshold to {}".format(r.energy_threshold))
        print("Say something")

        audio = r.listen(source, phrase_time_limit=10)

        command = ""
        try:
            command = r.recognize_google(audio)
        except:
            print("Coundn't understand you!")

        print(command)

initSpeech()

This is my code to recognize my voice but it always prints out "Coundn't understand you!" when I record my voice using python with the following script and put the wave file as input for the speech recognition it works fine:

import pyaudio
import wave

CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print("* recording")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

This script to record my voice and then using this file "output.wav" as input for the speech recognition.

EDIT:

With,

with open("microphone-results.wav", "wb") as f:
        f.write(audio.get_wav_data())

I recorded my voice which will be analyzed. And it sounded really bad, low and slow like in bad movies with an voice changer. Maybe this is a hint for the solution. I already checked the settings of chuck_size and sample_rate these are identical with the settings in my recording script above. My system: Windows 10

There is also an issue on github github issue 358

Python: 3.6

Thank you for your help!

like image 692
Tobias Schäfer Avatar asked May 13 '18 14:05

Tobias Schäfer


People also ask

Is Python good for speech recognition?

It allows computers to understand human language. Speech recognition is a machine's ability to listen to spoken words and identify them. You can then use speech recognition in Python to convert the spoken words into text, make a query or give a reply. You can even program some devices to respond to these spoken words.

How can I make my speech recognition more accurate?

Use high-quality headset microphone Using a high-quality headset microphone is one of the most important factors to improve voice recognition. It is because these are not only capable of catching the right words, but also have the ability to hold a microphone in front of your mouth at a consistent position directly.

Why is speech recognition so difficult?

Background noise is one of the biggest challenges. Especially as voice recognition software leaves the confines of the personal computer to inhabit smart devices in varied environments, we need to deal with cross-talk, white noise, and other signal muddying effects.

Is speech recognition software accurate?

Speech recognition accuracy rates are 90% to 95%. Here's a basic breakdown of how speech recognition works: A microphone translates the vibrations of a person's voice into an electrical signal. A computer or similar system converts that signal into a digital signal.


1 Answers

Your audio is obviously not recorded properly, and this leads to recognition failure. My guess is that r.adjust_for_ambient_noise is failing you (automatic speech/silence detectors are not simple to implement). Start with removing this line and manually set

r.energy_threshold = 50
r.dynamic_energy_threshold = False

After that, save the recorded audio into .WAV file and listen. You have to get your audio clear before you send it to ASR engine.

Also, I recommend you to make sure that you are using the microphone you intended to use

print(Microphone.list_microphone_names()[0])
like image 85
igrinis Avatar answered Oct 26 '22 23:10

igrinis