Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AvaudioEngine - Record voice at specific sample rate AvaudioEngine for Analysis

we are working on a project which records voice from an external microphone. For analysis purposes, we need to have a sample rate of about 5k Hz.

We are using AvAudioEngine to record a voice. We know Apple devices want able to record at a specific rate, so we are using AVAudioConverter to downgrade the sample rate.

But as you know it is similar to the compression, so the lower we reduce sample rate, file size and file duration affect the same. Which is currently happening(Correct me if I am wrong in this).

Issue

**Issue is downgrading sample rate shorter the file length and its effects on calculation & analysis. For example, a 1-hour recording was downgraded to 45 mins. So suppose if we are making analysis on 5 minute period interval, it goes wrong

What will be the best solution for this?**

Query

We have searched over the internet but we could not figure out how buffer size on installTap affects? In the current code, we have set it to 2688.

Can anyone clarify?

Code

let bus = 0
let inputNode = engine.inputNode

let equalizer = AVAudioUnitEQ(numberOfBands: 2)

equalizer.bands[0].filterType = .lowPass
equalizer.bands[0].frequency = 3000
equalizer.bands[0].bypass = false

equalizer.bands[1].filterType = .highPass
equalizer.bands[1].frequency = 1000
equalizer.bands[1].bypass = false
engine.attach(equalizer) //Attach equalizer

// Connect nodes
engine.connect(inputNode, to: equalizer, format: inputNode.inputFormat(forBus: 0))
engine.connect(equalizer, to: engine.mainMixerNode, format: inputNode.inputFormat(forBus: 0))

// call before creating converter because this changes the mainMixer's output format
engine.prepare()

let outputFormat = AVAudioFormat(commonFormat: .pcmFormatInt16,
                                 sampleRate: 5000,
                                 channels: 1,
                                 interleaved: false)!

// Downsampling converter
guard let converter: AVAudioConverter = AVAudioConverter(from: engine.mainMixerNode.outputFormat(forBus: 0), to: outputFormat) else {
    print("Can't convert in to this format")
    return
}

engine.mainMixerNode.installTap(onBus: bus, bufferSize: 2688, format: nil) { (buffer, time) in
    var newBufferAvailable = true
    
    let inputCallback: AVAudioConverterInputBlock = { inNumPackets, outStatus in
        if newBufferAvailable {
            outStatus.pointee = .haveData
            newBufferAvailable = false
            return buffer
        } else {
            outStatus.pointee = .noDataNow
            return nil
        }
    }
    
    
    let convertedBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: AVAudioFrameCount(outputFormat.sampleRate) * buffer.frameLength / AVAudioFrameCount(buffer.format.sampleRate))!
    
    var error: NSError?
    let status = converter.convert(to: convertedBuffer, error: &error, withInputFrom: inputCallback)
    assert(status != .error)
    
    
    if status == .haveData {
        // Process with converted buffer
    }
}

do {
    try engine.start()
} catch {
    print("Can't start the engine: \(error)")
}

Expecting Result

We are fine with compression of buffer but We would like to have the same recording duration in the output file. If we record for 10 minutes output file should have 10 minutes of data.

like image 379
Bhavin Vaghela Avatar asked Dec 22 '21 07:12

Bhavin Vaghela


People also ask

What is the sample rate for audio files?

Audio for video (including dubbing voice-over) uses a sample rate of 48,000 Hz (or 48 kHz), which was adopted early on because audio at this rate can be synchronized to film and video in the US and Europe. Likewise, the audio used in phone directories is often recorded at 8 kHz because it works with the frequencies used in phone lines.

What is an audio engine object?

An audio engine object contains a group of AVAudioNode instances that you attach to form an audio processing chain. You can connect, disconnect, and remove audio nodes during runtime with minor limitations. Removing an audio node that has differing channel counts, or that’s a mixer, can break the graph.

What is the sample rate for voice-over?

A standard sample rate is 44,100 Hz – that’s audio that’s recorded at 44,100 samples per second. That figure can also be represented as 44.1 kHz, or 44.1 thousands of Hz. This rate was adopted for CDs, and was carried on to the first MP3s as well, and for this reason is still the go-to for most voice-over recordings.

How does audio engine render to audio devices?

By default, Audio Engine renders to a connected audio device in real time. You can configure the engine to operate in manual rendering mode when you need to render at, or faster than, real time. In that mode, the engine disconnects from audio devices and your app drives the rendering.


1 Answers

Digitized audio doesn't have an intrinsic duration since it can be played back at any sample rate.

In order for the resulting file's duration to be what you expect, the sample rates have to be what you expect at each stage: Recording, processing, and playback.

I suspect that one of two possible things is happening:

A) the sample rate of the buffer you receive inside installtap is not what you assumed it would be... and you are converting from the wrong format.

B) You are playing back your audio at sample rates than are different that what you are assuming they are. (How do you know that your player is playing at 5000hz)?

In order to check this, you would have to break the process down into smaller pieces and check the sample rate at each stage.

like image 166
Nerdy Bunz Avatar answered Oct 21 '22 08:10

Nerdy Bunz