Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

wave.Error: unknown format: 3 arises when trying to convert a wav file into text in Python

I need to record an audio from the microphone and convert it into text. I have tried this conversion process using several audio clips that I downloaded from the web and it works fine. But when I try to convert the audio clip I recorded from the microphone it gives the following error.

Traceback (most recent call last): File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\speech_recognition__init__.py", line 203, in enter self.audio_reader = wave.open(self.filename_or_fileobject, "rb") File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\wave.py", line 510, in open return Wave_read(f) File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\wave.py", line 164, in init self.initfp(f) File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\wave.py", line 144, in initfp self._read_fmt_chunk(chunk) File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\wave.py", line 269, in _read_fmt_chunk raise Error('unknown format: %r' % (wFormatTag,)) wave.Error: unknown format: 3

The code I am trying is as follows.

import speech_recognition as sr
import sounddevice as sd
from scipy.io.wavfile import write

# recording from the microphone
fs = 44100  # Sample rate
seconds = 3  # Duration of recording

myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=2)
sd.wait()  # Wait until recording is finished
write('output.wav', fs, myrecording)  # Save as WAV file
sound = "output.wav"
recognizer = sr.Recognizer()

with sr.AudioFile(sound) as source:
     recognizer.adjust_for_ambient_noise(source)
     print("Converting audio file to text...")
     audio = recognizer.listen(source)

     try:
          text = recognizer.recognize_google(audio)
          print("The converted text:" + text)

     except Exception as e:
          print(e)

I looked at the similar questions that were answered, and they say that we need to convert it into a different wav format. Can someone provide me a code or a library that I can use for this conversion? Thank you in advance.

like image 860
Hirushi Ekanayake Avatar asked Feb 22 '20 13:02

Hirushi Ekanayake


2 Answers

You wrote the file in float format:

soxi output.wav 

Input File     : 'output.wav'
Channels       : 2
Sample Rate    : 44100
Precision      : 25-bit
Duration       : 00:00:03.00 = 132300 samples = 225 CDDA sectors
File Size      : 1.06M
Bit Rate       : 2.82M
Sample Encoding: 32-bit Floating Point PCM

and wave module can't read it.

To store int16 format do like this:

import numpy as np
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=2)
sd.wait()  # Wait until recording is finished
write('output.wav', fs, myrecording.astype(np.int16))  # Save as WAV file in 16-bit format
like image 89
Nikolay Shmyrev Avatar answered Nov 04 '22 22:11

Nikolay Shmyrev


Method 1

You can't hear anything because you cast floating point value to an integer which is incorrect. The floating point values in a signal go from -1 to 1 in a WAV file and the 16 bit PCM (integer) values go from -32,768 to 32,767. So essentially, your signal got converted from something like
[-1.4240753e-05, 4.3602209e-05, 1.0526689e-06, ..., 1.7763522e-02, 1.6644333e-02, 6.7148944e-03]
to
[0, 0, 0, ..., 0, 0, 0]

The above conversion is incorrect.

To correctly convert the file into integers (PCM format), you would need to convert and not cast. One way of doing this is given below `def float2pcm(sig, dtype='int16'): sig = np.asarray(sig) dtype = np.dtype(dtype)

i = np.iinfo(dtype)
abs_max = 2 ** (i.bits - 1)
offset = i.min + abs_max
return (sig * abs_max + offset).clip(i.min, i.max).astype(dtype)`

so you can use the following code just after you use the sd.wait line

float2pcm(myrecording)

Method 2

Another (more simpler) way of solving your problem would be to use the sounddevice library's capability to do this internally by calling the following function for recording instead.

import numpy as np
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=2, dtype=np.int16)
like image 44
Ahsan Memon Avatar answered Nov 04 '22 21:11

Ahsan Memon