Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Wave byte data

Tags:

python

audio

wave

I'm trying to read the data from a .wav file.

import wave
wr = wave.open("~/01 Road.wav", 'r')
# sample width is 2 bytes
# number of channels is 2
wave_data = wr.readframes(1)
print(wave_data)

This gives:

b'\x00\x00\x00\x00'

Which is the "first frame" of the song. These 4 bytes obviously correspond to the (2 channels * 2 byte sample width) bytes per frame, but what does each byte correspond to?

In particular, I'm trying to convert it to a mono amplitude signal.

like image 813
jameh Avatar asked Dec 19 '13 09:12

jameh


People also ask

How to read WAV files in Python?

The wave module in Python's standard library is an easy interface to the audio WAV format. The functions in this module can write audio data in raw format to a file like object and read the attributes of a WAV file.

What are bytes in Python and how are they used?

What are bytes in Python? Generally, when we save any data in secondary storage, it is encoded according to a certain type of encoding such as ASCII, UTF-8, and UTF-16 for strings, PNG, JPG and JPEG for images, and mp3 and wav for audio files and is turned into a byte object.

What are the first few bytes in a Wave module called?

The first few bytes are the characters "RIFF". I don't see that. The wave module skips over the header and returns the data, and that data looks far more like 16-bit PCM data. In the real world, you'll never get a long sequence of 0s.

Is it possible to create a wave file in pywave?

It can also create and write wave files, but it's currently limited to PCM-Waves and pure data (no metadata). PyWave is supposed to replace the builtin Python extension wave, which doesn't support >16-bit wave files.


3 Answers

If you want to understand what the 'frame' is you will have to read the standard of the wave file format. For instance: https://web.archive.org/web/20140221054954/http://home.roadrunner.com/~jgglatt/tech/wave.htm

From that document:

The sample points that are meant to be "played" ie, sent to a Digital to Analog Converter(DAC) simultaneously are collectively called a sample frame. In the example of our stereo waveform, every two sample points makes up another sample frame. This is illustrated below for that stereo example.

sample       sample              sample
frame 0      frame 1             frame N
 _____ _____ _____ _____         _____ _____
| ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
|_____|_____|_____|_____|       |_____|_____|
 _____
|     | = one sample point
|_____|

To convert to mono you could do something like this,

import wave

def stereo_to_mono(hex1, hex2):
    """average two hex string samples"""
    return hex((ord(hex1) + ord(hex2))/2)

wr = wave.open('piano2.wav','r')

nchannels, sampwidth, framerate, nframes, comptype, compname =  wr.getparams()

ww = wave.open('piano_mono.wav','wb')
ww.setparams((1,sampwidth,framerate,nframes,comptype,compname))

frames = wr.readframes(wr.getnframes()-1)

new_frames = ''

for (s1, s2) in zip(frames[0::2],frames[1::2]):
    new_frames += stereo_to_mono(s1,s2)[2:].zfill(2).decode('hex')

ww.writeframes(new_frames)

There is no clear-cut way to go from stereo to mono. You could just drop one channel. Above, I am averaging the channels. It all depends on your application.

like image 111
William Denman Avatar answered Sep 21 '22 06:09

William Denman


For wav file IO I prefer to use scipy. It is perhaps overkill for reading a wav file, but generally after reading the wav it is easier to do downstream processing.

import scipy.io.wavfile
fs1, y1 = scipy.io.wavfile.read(filename)

From here the data y1, will be N samples long, and will have Z columns where each column corresponds to a channel. To convert to a mono wav file you don't say how you'd like to do that conversion. You can take the average, or whatever else you'd like. For average use

monoChannel = y1.mean(axis=1)
like image 23
Paul Avatar answered Sep 20 '22 06:09

Paul


As a direct answer to your question: two bytes make one 16-bit integer value in the "usual" way, given by the explicit formula: value = ord(data[0]) + 256 * ord(data[1]). But using the struct module is a better way to decode (and later reencode) such multibyte integers:

import struct
print(struct.unpack("HH", b"\x00\x00\x00\x00"))
# -> gives a 2-tuple of integers, here (0, 0)

or, if we want a signed 16-bit integer (which I think is the case in .wav files), use "hh" instead of "HH". (I leave to you the task of figuring out how exactly two bytes can encode an integer value from -32768 to 32767 :-)

like image 39
Armin Rigo Avatar answered Sep 21 '22 06:09

Armin Rigo