I need to record an audio from the microphone and convert it into text. I have tried this conversion process using several audio clips that I downloaded from the web and it works fine. But when I try to convert the audio clip I recorded from the microphone it gives the following error.
Traceback (most recent call last): File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\speech_recognition__init__.py", line 203, in enter self.audio_reader = wave.open(self.filename_or_fileobject, "rb") File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\wave.py", line 510, in open return Wave_read(f) File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\wave.py", line 164, in init self.initfp(f) File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\wave.py", line 144, in initfp self._read_fmt_chunk(chunk) File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\wave.py", line 269, in _read_fmt_chunk raise Error('unknown format: %r' % (wFormatTag,)) wave.Error: unknown format: 3
The code I am trying is as follows.
import speech_recognition as sr
import sounddevice as sd
from scipy.io.wavfile import write
# recording from the microphone
fs = 44100 # Sample rate
seconds = 3 # Duration of recording
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=2)
sd.wait() # Wait until recording is finished
write('output.wav', fs, myrecording) # Save as WAV file
sound = "output.wav"
recognizer = sr.Recognizer()
with sr.AudioFile(sound) as source:
recognizer.adjust_for_ambient_noise(source)
print("Converting audio file to text...")
audio = recognizer.listen(source)
try:
text = recognizer.recognize_google(audio)
print("The converted text:" + text)
except Exception as e:
print(e)
I looked at the similar questions that were answered, and they say that we need to convert it into a different wav format. Can someone provide me a code or a library that I can use for this conversion? Thank you in advance.
You wrote the file in float format:
soxi output.wav
Input File : 'output.wav'
Channels : 2
Sample Rate : 44100
Precision : 25-bit
Duration : 00:00:03.00 = 132300 samples = 225 CDDA sectors
File Size : 1.06M
Bit Rate : 2.82M
Sample Encoding: 32-bit Floating Point PCM
and wave module can't read it.
To store int16 format do like this:
import numpy as np
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=2)
sd.wait() # Wait until recording is finished
write('output.wav', fs, myrecording.astype(np.int16)) # Save as WAV file in 16-bit format
You can't hear anything because you cast floating point value to an integer which is incorrect. The floating point values in a signal go from -1 to 1 in a WAV file and the 16 bit PCM (integer) values go from -32,768 to 32,767. So essentially, your signal got converted from something like [-1.4240753e-05, 4.3602209e-05, 1.0526689e-06, ...,
1.7763522e-02, 1.6644333e-02, 6.7148944e-03]
to [0, 0, 0, ..., 0, 0, 0]
The above conversion is incorrect.
To correctly convert the file into integers (PCM format), you would need to convert and not cast. One way of doing this is given below `def float2pcm(sig, dtype='int16'): sig = np.asarray(sig) dtype = np.dtype(dtype)
i = np.iinfo(dtype)
abs_max = 2 ** (i.bits - 1)
offset = i.min + abs_max
return (sig * abs_max + offset).clip(i.min, i.max).astype(dtype)`
so you can use the following code just after you use the sd.wait
line
float2pcm(myrecording)
Another (more simpler) way of solving your problem would be to use the sounddevice
library's capability to do this internally by calling the following function for recording instead.
import numpy as np
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=2, dtype=np.int16)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With