Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I obtain the raw audio frames from the microphone in real-time or from a saved audio file in iOS?

I am trying to extract MFCC vectors from the audio signal as input into a recurrent neural network. However, I am having trouble figuring out how to obtain the raw audio frames in Swift using Core Audio. Presumably, I have to go low-level to get that data, but I cannot find helpful resources in this area.

How can I get the audio signal information that I need using Swift?

Edit: This question was flagged as a possible duplicate of How to capture audio samples in iOS with Swift?. However, that particular question does not have the answer that I am looking for. Namely, the solution to that question is the creation of an AVAudioRecorder, which is a component, not the end result, of a solution to my question.

This question How to convert WAV/CAF file's sample data to byte array? is more in the direction of where I am headed. The solutions to that are written in Objective-C, and I am wondering if there is a way to do it in Swift.

like image 849
macklinagent Avatar asked Dec 02 '22 12:12

macklinagent


1 Answers

Attaching a tap to the default input node on AVAudioEngine is pretty straightforward and will get you real-time ~100ms chunks of audio from the microphone as Float32 arrays. You don't even have to connect any other audio units. If your MFCC extractor & network are sufficiently responsive this may be the easiest way to go.

let audioEngine = AVAudioEngine()
if let inputNode = audioEngine.inputNode {
    inputNode.installTap( onBus: 0,         // mono input
                          bufferSize: 1000, // a request, not a guarantee
                          format: nil,      // no format translation
                          block: { buffer, when in 

        // This block will be called over and over for successive buffers 
        // of microphone data until you stop() AVAudioEngine
        let actualSampleCount = Int(buffer.frameLength)

        // buffer.floatChannelData?.pointee[n] has the data for point n
        var i=0
        while (i < actualSampleCount) {
            let val = buffer.floatChannelData?.pointee[i]
            // do something to each sample here...
            i += 1
        }
    })

    do {
        try audioEngine.start()
    } catch let error as NSError {
        print("Got an error starting audioEngine: \(error.domain), \(error)")
    }
}

You will need to request and obtain microphone permission as well.

I find the amplitudes to be rather low, so you may need to apply some gain or normalization depending on your network's needs.

To process your WAV files, I'd try AVAssetReader, though I don't have code at hand for that.

like image 143
Jason Campbell Avatar answered Dec 29 '22 11:12

Jason Campbell