After going through the documentation of pyaudio and reading some other articles on the web, I am confused if my understanding is correct. This is the code for audio recording found on pyaudio's site: <pre class="prettyprint"><code>import pyaudio import wave CHUNK = 1024 FORMAT = pyaudio.paInt16 CHANNELS = 2 RATE = 44100 RECORD_SECONDS = 5 WAVE_OUTPUT_FILENAME = "output.wav" p = pyaudio.PyAudio() stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK) print("* recording") frames = [] for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) print("* done recording") stream.stop_stream() stream.close() p.terminate() </code></pre> and if I add these lines then I am able to play whatever I recorded: <pre class="prettyprint"><code>play=pyaudio.PyAudio() stream_play=play.open(format=FORMAT, channels=CHANNELS, rate=RATE, output=True) for data in frames: stream_play.write(data) stream_play.stop_stream() stream_play.close() play.terminate() </code></pre> <ol> <li>"RATE" is the number of samples collected per second.</li> <li>"CHUNK" is the number of frames in the buffer.</li> <li>Each frame will have 2 samples as "CHANNELS=2".</li> <li>Size of each sample is 2 bytes, calculated using the function: <code>pyaudio.get_sample_size(pyaudio.paInt16)</code>.</li> <li>Therefore size of each frame is 4 bytes.</li> <li>In the "frames" list, size of each element must be 1024*4 bytes, for example, size of <code>frames[0]</code> must be 4096 bytes. However, <code>sys.getsizeof(frames[0])</code> returns <code>4133</code>, but <code>len(frames[0])</code> returns <code>4096</code>.</li> <li> <code>for</code> loop executes <code>int(RATE / CHUNK * RECORD_SECONDS)</code> times, I cant understand why. Here is the same question answered by "Ruben Sanchez" but I cant be sure if its correct as he says <code>CHUNK=bytes</code>. And according to his explanation, it must be <code>int(RATE / (CHUNK*2) * RECORD_SECONDS)</code> as <code>(CHUNK*2)</code> is the number of samples read in buffer with each iteration.</li> <li>Finally when I write <code>print frames[0]</code>, it prints gibberish as it tries to treat the string to be ASCII encoded which it is not, it is just a stream of bytes. So how do I print this stream of bytes in hexadecimal using <code>struct</code> module? And if later, I change each of the hexadecimal value with values of my choice, will it still produce a playable sound?</li> </ol> Whatever I wrote above was my understanding of the things and many of them maybe wrong.

<ol> <li>"RATE" is the "sampling rate", i.e. the number of frames per second</li> <li>"CHUNK" is the (arbitrarily chosen) number of frames the (potentially very long) signals are split into in this example</li> <li>Yes, each frame will have 2 samples as "CHANNELS=2", but the term "samples" is seldom used in this context (because it is confusing)</li> <li>Yes, size of each sample is 2 bytes (= 16 bits) in this example</li> <li>Yes, size of each frame is 4 bytes</li> <li>Yes, each element of "frames" should be 4096 bytes. <code>sys.getsizeof()</code> reports the storage space needed by the Python interpreter, which is typically a bit more than the actual size of the raw data.</li> <li> <code>RATE * RECORD_SECONDS</code> is the number of frames that should be recorded. Since the <code>for</code> loop is not repeated for each frame but only for each chunk, the number of loops has to be divided by the chunk size <code>CHUNK</code>. This has nothing to do with samples, so there is no factor of <code>2</code> involved.</li> <li>If you really want to see the hexadecimal values, you can try something like <code>[hex(x) for x in frames[0]]</code>. If you want to get the actual 2-byte numbers use the format string <code>'<H'</code> with the <code>struct</code> module.</li> </ol> You might be interested in my tutorial about reading WAV files with the <code>wave</code> module, which covers some of your questions in more detail: http://nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/audio-files/audio-files-with-wave.ipynb

What are chunks, samples and frames when using pyaudio

Tags:

python

python-2.7

audio

sampling

pyaudio

After going through the documentation of pyaudio and reading some other articles on the web, I am confused if my understanding is correct.

This is the code for audio recording found on pyaudio's site:

import pyaudio import wave  CHUNK = 1024 FORMAT = pyaudio.paInt16 CHANNELS = 2 RATE = 44100 RECORD_SECONDS = 5 WAVE_OUTPUT_FILENAME = "output.wav"  p = pyaudio.PyAudio()  stream = p.open(format=FORMAT,                 channels=CHANNELS,                 rate=RATE,                 input=True,                 frames_per_buffer=CHUNK)  print("* recording")  frames = []  for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):     data = stream.read(CHUNK)     frames.append(data)  print("* done recording")  stream.stop_stream() stream.close() p.terminate()

and if I add these lines then I am able to play whatever I recorded:

play=pyaudio.PyAudio() stream_play=play.open(format=FORMAT,                       channels=CHANNELS,                       rate=RATE,                       output=True) for data in frames:      stream_play.write(data) stream_play.stop_stream() stream_play.close() play.terminate()

"RATE" is the number of samples collected per second.
"CHUNK" is the number of frames in the buffer.
Each frame will have 2 samples as "CHANNELS=2".
Size of each sample is 2 bytes, calculated using the function: pyaudio.get_sample_size(pyaudio.paInt16).
Therefore size of each frame is 4 bytes.
In the "frames" list, size of each element must be 1024*4 bytes, for example, size of frames[0] must be 4096 bytes. However, sys.getsizeof(frames[0]) returns 4133, but len(frames[0]) returns 4096.
for loop executes int(RATE / CHUNK * RECORD_SECONDS) times, I cant understand why. Here is the same question answered by "Ruben Sanchez" but I cant be sure if its correct as he says CHUNK=bytes. And according to his explanation, it must be int(RATE / (CHUNK*2) * RECORD_SECONDS) as (CHUNK*2) is the number of samples read in buffer with each iteration.
Finally when I write print frames[0], it prints gibberish as it tries to treat the string to be ASCII encoded which it is not, it is just a stream of bytes. So how do I print this stream of bytes in hexadecimal using struct module? And if later, I change each of the hexadecimal value with values of my choice, will it still produce a playable sound?

Whatever I wrote above was my understanding of the things and many of them maybe wrong.

350

asked Mar 13 '16 12:03

shiva

1 Answers

"RATE" is the "sampling rate", i.e. the number of frames per second
"CHUNK" is the (arbitrarily chosen) number of frames the (potentially very long) signals are split into in this example
Yes, each frame will have 2 samples as "CHANNELS=2", but the term "samples" is seldom used in this context (because it is confusing)
Yes, size of each sample is 2 bytes (= 16 bits) in this example
Yes, size of each frame is 4 bytes
Yes, each element of "frames" should be 4096 bytes. sys.getsizeof() reports the storage space needed by the Python interpreter, which is typically a bit more than the actual size of the raw data.
RATE * RECORD_SECONDS is the number of frames that should be recorded. Since the for loop is not repeated for each frame but only for each chunk, the number of loops has to be divided by the chunk size CHUNK. This has nothing to do with samples, so there is no factor of 2 involved.
If you really want to see the hexadecimal values, you can try something like [hex(x) for x in frames[0]]. If you want to get the actual 2-byte numbers use the format string '<H' with the struct module.

You might be interested in my tutorial about reading WAV files with the wave module, which covers some of your questions in more detail: http://nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/audio-files/audio-files-with-wave.ipynb

138

answered Sep 25 '22 18:09

Matthias

Related questions
                            
                                How to compile a Python package to a dll
                            
                                How to integrate a standalone Python script into a Rails application?
                            
                                Calling Python script from C++ and using its output
                            
                                How to obtain arguments passed to setup.py from pip with '--install-option'?
                            
                                No display name and no $DISPLAY environment variable using tkinter through ssh [duplicate]
                            
                                Overcome ValueError for empty array
                            
                                How to make matplotlib show all x coordinates?
                            
                                How to display print statements interlaced with matplotlib plots inline in Ipython?
                            
                                Confusion between prepared statement and parameterized query in Python
                            
                                How do I write a sequence of promises in Python?
                            
                                The request's session was deleted before the request completed. The user may have logged out in a concurrent request, for example
                            
                                Mercurial scripting with python
                            
                                algorithm for python itertools.permutations
                            
                                How to clear the whole cache when using django's page_cache decorator
                            
                                python setup.py sdist only including .py source from top level module
                            
                                Python 2: SMTPServerDisconnected: Connection unexpectedly closed
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character in position 0: ordinal not in range(128)
                            
                                Dependency rule tried to blank out primary key in SQLAlchemy, when foreign key constraint is part of composite primary key
                            
                                ValueError: DataFrame index must be unique for orient='columns'
                            
                                Flask permanent session: where to define them?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With