What does a audio frame contain?

Tags:

I'm doing some research on how to compare sound files(wave). Basically, I want to compare stored soundfiles (wav) with sound from a microphone. So in the end I would like to pre-store some voice commands of my own and then when I'm running my app I would like to compare the pre-stored files with input from the microphone.

My thought was to put in some margin when comparing because saying something two times in a row in the exactly same way would be difficult I guess.

So after some googling I see that Python has this module named wave and the Wave_read object. That object has a function named readframes(n):

Reads and returns at most n frames of audio, as a string of bytes.

What do these bytes contain? I'm thinking of looping thru the wave files one frame at the time comparing them frame by frame.

706

asked Oct 18 '10 06:10

Jason94

2 Answers

An audio frame, or sample, contains amplitude (loudness) information at that particular point in time. To produce sound, tens of thousands of frames are played in sequence to produce frequencies.

In the case of CD quality audio or uncompressed wave audio, there are around 44,100 frames/samples per second. Each of those frames contains 16-bits of resolution, allowing for fairly precise representations of the sound levels. Also, because CD audio is stereo, there is actually twice as much information, 16-bits for the left channel, 16-bits for the right.

When you use the sound module in python to get a frame, it will be returned as a series of hexadecimal characters:

One character for an 8-bit mono signal.
Two characters for 8-bit stereo.
Two characters for 16-bit mono.
Four characters for 16-bit stereo.

In order to convert and compare these values you'll have to first use the python wave module's functions to check the bit depth and number of channels. Otherwise, you'll be comparing mismatched quality settings.

179

answered Oct 06 '22 18:10

Soviut

A simple byte-by-byte comparison has almost no chance of a successful match, even with some tolerance thrown in. Voice-pattern recognition is a very complex and subtle problem that is still the subject of much research.

answered Oct 06 '22 17:10

Marcelo Cantos

Related questions
                            
                                What's a quaternion rotation?
                            
                                Best way to convert string to decimal separator "." and "," insensitive way?
                            
                                How to subtract 4 months from today's date?
                            
                                python -m SimpleHTTPServer - Listening on 0.0.0.0:8000 but http://0.0.0.0:8000/test.html gives "Page Not Found"
                            
                                A NSFetchedResultsController with date as sectionNameKeyPath
                            
                                How to change returned ContentType in ASP.NET MVC controller (ActionResult)
                            
                                Enter key on EditText hitting onKey twice
                            
                                CSS: unexpected vertical position of "inline-block" elements
                            
                                is there a way to append leading zero's to an ordered number list? (01 or 001 as opposed to just 1)
                            
                                How to disable horizontal scroll bar in FlowLayoutPanel?
                            
                                You must supply a resource ID for a TextView android error
                            
                                Breaking java generics naming convention? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With