I want to have two audio files and mix and play it programmatically. When I am playing the first audio file, after some time(dynamic time) I need to add the second small audio file with the first audio file when somewhere middle of the first audio file is playing, then finally I need to save as one audio file on the device. It should play the audio file with the mixer audio I included the second one.
I have gone through many forums, but couldn't get the clue exactly how to achieve this?
Could someone please clarify my below doubts?
I don't know how to achieve this. Please suggest your thoughts!
In this case, what audio file/format I should use? Can I use .avi files?
You can choose a compressed or non-compressed format. Common non-compressed formats include Wav and AIFF. CAF can represent compressed and non compressed data. .avi is not an option (offered by the OS).
If the files are large and storage space (on disk) is a concern, you may consider AAC format saved in a CAF (or simply .m4a). For most applications, 16 bit samples will be enough, and you can also save space, memory and cpu by saving these files at an appropriate sample rate (ref: CDs are 44.1kHz).
Since ExtAudioFile interface abstract the conversion process, you should not have to change your program to compare size and speed differences of compressed and non-compressed formats for your distribution (AAC in CAF would be fine for normal applications).
Noncompressed CD quality audio will consume about 5.3 MB per minute, per channel. So if you have 2 stereo audio files, each 3 minutes long, and a 3 minute destination buffer, your memory requirement would be around 50 MB.
Since you have 'minutes' of audio, you may need to consider avoiding loading all audio data into memory at once. In order to read, manipulate, and combine audio, you will need a non-compressed representation to work with in memory, so compression formats would not help here. As well, converting a compressed representation to pcm takes a good amount of resources; reading a compressed file, although fewer bytes, can take more (or less) time.
How to add the second audio after the dynamic time set onto the first audio file programmatically? For ex: If the first audio total time is 2 mins, I might need to mix the second audio file (3 seconds audio) somewhere in 1 min or 1.5 mins or 55 seconds of the first file. Its dynamic.
To read the files and convert them to the format you want to use, use ExtAudioFile APIs - this will convert to your destination sample format for you. Common PCM sample representations in memory include SInt32
, SInt16
, and float
, but that can vary wildly based on the application and the hardware (beyond iOS). ExtAudioFile APIs would also convert compressed formats to PCM, if needed.
Your input audio files should have the same sample rate. If not, you will have to resample the audio, a complex process which also takes a lot of resources (if done correctly/accurately). If you need to support resampling, double the time you've allocated to completing this task (not detailing the process here).
To add the sounds, you would request PCM samples from the files, process, and write to the output file (or buffer in memory).
To determine when to add the other sounds, you will need to get the sample rates for the input files (via ExtAudioFileGetProperty). If you want to write the second sound to the destination buffer at 55s, then you would start adding the sounds at sample number SampleRate * 55
, where SampleRate
is the sample rate of the files you are reading.
To mix audio, you will just use this form (pseudocode):
mixed[i] = fileA[i] + fileB[i];
but you have to be sure you avoid over/underflow and other arithmetic errors. Typically, you will perform this process using some integer value, because floating point calculations can take a long time (when there are so many). For some applications, you could just shift and add with no worry of overflow - this would effectively reduce each input by one half before adding them. The amplitude of the result would be one half. If you have control over the files' content (e.g. they are all bundled as resources) then you could simply ensure no peak sample in the files exceeded one half of the full scale value (about -6dBFS). Of course, saving as float would solve this issue at the expense of introducing higher CPU, memory, and file i/o demands.
At this point, you'd have 2 files open for reading, and one open for writing, then a few small temporary buffers for processing and mixing the inputs before writing to the output file. You should perform these requests in blocks for efficiency (e.g. read 1024 samples from each file, process the samples, write 1024 samples). The APIs don't guarantee much regarding caching and buffering for efficiency.
How to save the final output audio file on the device? If I save the audio file programmatically somewhere, can I play back again?
ExtAudioFile APIs would work for your read and writing needs. Yes, you can read/play it later.
Hello You can do this by using av foundation
- (BOOL) combineVoices1
{
NSError *error = nil;
BOOL ok = NO;
NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectory = [paths objectAtIndex:0];
CMTime nextClipStartTime = kCMTimeZero;
//Create AVMutableComposition Object.This object will hold our multiple AVMutableCompositionTrack.
AVMutableComposition *composition = [[AVMutableComposition alloc] init];
AVMutableCompositionTrack *compositionAudioTrack = [composition addMutableTrackWithMediaType:AVMediaTypeAudio preferredTrackID:kCMPersistentTrackID_Invalid];
[compositionAudioTrack setPreferredVolume:0.8];
NSString *soundOne =[[NSBundle mainBundle]pathForResource:@"test1" ofType:@"caf"];
NSURL *url = [NSURL fileURLWithPath:soundOne];
AVAsset *avAsset = [AVURLAsset URLAssetWithURL:url options:nil];
NSArray *tracks = [avAsset tracksWithMediaType:AVMediaTypeAudio];
AVAssetTrack *clipAudioTrack = [[avAsset tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
[compositionAudioTrack insertTimeRange:CMTimeRangeMake(kCMTimeZero, avAsset.duration) ofTrack:clipAudioTrack atTime:kCMTimeZero error:nil];
AVMutableCompositionTrack *compositionAudioTrack1 = [composition addMutableTrackWithMediaType:AVMediaTypeAudio preferredTrackID:kCMPersistentTrackID_Invalid];
[compositionAudioTrack setPreferredVolume:0.3];
NSString *soundOne1 =[[NSBundle mainBundle]pathForResource:@"test" ofType:@"caf"];
NSURL *url1 = [NSURL fileURLWithPath:soundOne1];
AVAsset *avAsset1 = [AVURLAsset URLAssetWithURL:url1 options:nil];
NSArray *tracks1 = [avAsset1 tracksWithMediaType:AVMediaTypeAudio];
AVAssetTrack *clipAudioTrack1 = [[avAsset1 tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
[compositionAudioTrack1 insertTimeRange:CMTimeRangeMake(kCMTimeZero, avAsset.duration) ofTrack:clipAudioTrack1 atTime:kCMTimeZero error:nil];
AVMutableCompositionTrack *compositionAudioTrack2 = [composition addMutableTrackWithMediaType:AVMediaTypeAudio preferredTrackID:kCMPersistentTrackID_Invalid];
[compositionAudioTrack2 setPreferredVolume:1.0];
NSString *soundOne2 =[[NSBundle mainBundle]pathForResource:@"song" ofType:@"caf"];
NSURL *url2 = [NSURL fileURLWithPath:soundOne2];
AVAsset *avAsset2 = [AVURLAsset URLAssetWithURL:url2 options:nil];
NSArray *tracks2 = [avAsset2 tracksWithMediaType:AVMediaTypeAudio];
AVAssetTrack *clipAudioTrack2 = [[avAsset2 tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
[compositionAudioTrack1 insertTimeRange:CMTimeRangeMake(kCMTimeZero, avAsset2.duration) ofTrack:clipAudioTrack2 atTime:kCMTimeZero error:nil];
AVAssetExportSession *exportSession = [AVAssetExportSession
exportSessionWithAsset:composition
presetName:AVAssetExportPresetAppleM4A];
if (nil == exportSession) return NO;
NSString *soundOneNew = [documentsDirectory stringByAppendingPathComponent:@"combined10.m4a"];
//NSLog(@"Output file path - %@",soundOneNew);
// configure export session output with all our parameters
exportSession.outputURL = [NSURL fileURLWithPath:soundOneNew]; // output path
exportSession.outputFileType = AVFileTypeAppleM4A; // output file type
// perform the export
[exportSession exportAsynchronouslyWithCompletionHandler:^{
if (AVAssetExportSessionStatusCompleted == exportSession.status) {
NSLog(@"AVAssetExportSessionStatusCompleted");
} else if (AVAssetExportSessionStatusFailed == exportSession.status) {
// a failure may happen because of an event out of your control
// for example, an interruption like a phone call comming in
// make sure and handle this case appropriately
NSLog(@"AVAssetExportSessionStatusFailed");
} else {
NSLog(@"Export Session Status: %d", exportSession.status);
}
}];
return YES;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With