Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large Wave File not being read in Python

Tags:

python

audio

wav

I am trying to do sound analysis on a file in Python, and I have a sound file from a show that is high definition and it is very large (2.39 GB). However, whenever I try to open this using the wave module, I get the following error:

wave.Error: unknown format: 65534

I got this file by converting a .ts file into a .wav file. I used the same method on standard definition shows and it worked just fine. I am able to do some analysis using

data = np.memmap(audioclip,dtype='h',mode='r')

however, this does not get accurate results, as it thinks the audioclip is 3 hours long when it is only one hour long. Any help would be appreciated, I have similar issues with different error codes, however those have not been much help to this issue. Thank you so much!

like image 611
clive alton Avatar asked Dec 25 '22 22:12

clive alton


1 Answers

Disclaimer: I don't really know that much about python.

I googled wave.py and found the following link: http://www.opensource.apple.com/source/python/python-3/python/Lib/wave.py

If you look for the function named _read_fmt_chunk you'll see the source of the error message. In short, the wave module only supports WAVE_FORMAT_PCM. Format 65534 is a format called WAVE_FORMAT_EXTENSIBLE defined by Microsoft and is used for multi-channel wave files. It's pretty uncommon.

I think you have a few options:

  1. Find a new method of converting the file that doesn't produce WAVE_FORMAT_EXTENSIBLE
  2. Modify the source for wave.py to support WAVE_FORMAT_EXTENSIBLE - assuming the SubFormat field is PCM or IEEE_FLOAT that wouldn't be a big deal. From that perspective it just increases the size of the header. If it is another SubFormat then you'll need to run an appropriate decoder before you can even get to PCM.
  3. Use another tool to convert the WAVE_FORMAT_EXTENSIBLE .wav file to one which is not. sox may be able to handle this.

Regarding the second part of your question. It's not clear from your question how you are determining the duration of the file. But if you make incorrect assumptions about the number of channels that could be throwing you off.

like image 172
jaket Avatar answered Dec 27 '22 11:12

jaket