we are working on a project which records voice from an external microphone. For analysis purposes, we need to have a sample rate of about 5k Hz.
We are using AvAudioEngine to record a voice. We know Apple devices want able to record at a specific rate, so we are using AVAudioConverter to downgrade the sample rate.
But as you know it is similar to the compression, so the lower we reduce sample rate, file size and file duration affect the same. Which is currently happening(Correct me if I am wrong in this).
Issue
**Issue is downgrading sample rate shorter the file length and its effects on calculation & analysis. For example, a 1-hour recording was downgraded to 45 mins. So suppose if we are making analysis on 5 minute period interval, it goes wrong
What will be the best solution for this?**
Query
We have searched over the internet but we could not figure out how buffer size on installTap affects? In the current code, we have set it to 2688.
Can anyone clarify?
Code
let bus = 0
let inputNode = engine.inputNode
let equalizer = AVAudioUnitEQ(numberOfBands: 2)
equalizer.bands[0].filterType = .lowPass
equalizer.bands[0].frequency = 3000
equalizer.bands[0].bypass = false
equalizer.bands[1].filterType = .highPass
equalizer.bands[1].frequency = 1000
equalizer.bands[1].bypass = false
engine.attach(equalizer) //Attach equalizer
// Connect nodes
engine.connect(inputNode, to: equalizer, format: inputNode.inputFormat(forBus: 0))
engine.connect(equalizer, to: engine.mainMixerNode, format: inputNode.inputFormat(forBus: 0))
// call before creating converter because this changes the mainMixer's output format
engine.prepare()
let outputFormat = AVAudioFormat(commonFormat: .pcmFormatInt16,
sampleRate: 5000,
channels: 1,
interleaved: false)!
// Downsampling converter
guard let converter: AVAudioConverter = AVAudioConverter(from: engine.mainMixerNode.outputFormat(forBus: 0), to: outputFormat) else {
print("Can't convert in to this format")
return
}
engine.mainMixerNode.installTap(onBus: bus, bufferSize: 2688, format: nil) { (buffer, time) in
var newBufferAvailable = true
let inputCallback: AVAudioConverterInputBlock = { inNumPackets, outStatus in
if newBufferAvailable {
outStatus.pointee = .haveData
newBufferAvailable = false
return buffer
} else {
outStatus.pointee = .noDataNow
return nil
}
}
let convertedBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: AVAudioFrameCount(outputFormat.sampleRate) * buffer.frameLength / AVAudioFrameCount(buffer.format.sampleRate))!
var error: NSError?
let status = converter.convert(to: convertedBuffer, error: &error, withInputFrom: inputCallback)
assert(status != .error)
if status == .haveData {
// Process with converted buffer
}
}
do {
try engine.start()
} catch {
print("Can't start the engine: \(error)")
}
Expecting Result
We are fine with compression of buffer but We would like to have the same recording duration in the output file. If we record for 10 minutes output file should have 10 minutes of data.
Audio for video (including dubbing voice-over) uses a sample rate of 48,000 Hz (or 48 kHz), which was adopted early on because audio at this rate can be synchronized to film and video in the US and Europe. Likewise, the audio used in phone directories is often recorded at 8 kHz because it works with the frequencies used in phone lines.
An audio engine object contains a group of AVAudioNode instances that you attach to form an audio processing chain. You can connect, disconnect, and remove audio nodes during runtime with minor limitations. Removing an audio node that has differing channel counts, or that’s a mixer, can break the graph.
A standard sample rate is 44,100 Hz – that’s audio that’s recorded at 44,100 samples per second. That figure can also be represented as 44.1 kHz, or 44.1 thousands of Hz. This rate was adopted for CDs, and was carried on to the first MP3s as well, and for this reason is still the go-to for most voice-over recordings.
By default, Audio Engine renders to a connected audio device in real time. You can configure the engine to operate in manual rendering mode when you need to render at, or faster than, real time. In that mode, the engine disconnects from audio devices and your app drives the rendering.
Digitized audio doesn't have an intrinsic duration since it can be played back at any sample rate.
In order for the resulting file's duration to be what you expect, the sample rates have to be what you expect at each stage: Recording, processing, and playback.
I suspect that one of two possible things is happening:
A) the sample rate of the buffer you receive inside installtap is not what you assumed it would be... and you are converting from the wrong format.
B) You are playing back your audio at sample rates than are different that what you are assuming they are. (How do you know that your player is playing at 5000hz)?
In order to check this, you would have to break the process down into smaller pieces and check the sample rate at each stage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With