I'm required to merge a video file and an audio file to a single video file, so that:
The technical term of this merging operation is called "muxing", as I've read.
As an example, suppose we have an input video of 10 seconds, and an audio file of 4 seconds, the output video would be of 10 seconds (always the same as the input video), and the audio will play 2.5 times (first 2 cover the first 8 seconds, and then 2 seconds out of 4 for the rest) .
While I have found a solution of how to mux a video and an audio (here), I've come across multiple issues:
I can't figure out how to loop the writing of the audio content when needed. It keeps giving me an error, no matter what I try
The input files must be of specific file formats. Otherwise, it might throw an exception, or (in very rare cases) worse: create a video file that has black content. Even more: Sometimes a '.mkv' file (for example) could be fine, and sometimes it won't be accepted (and both can be played on a video player app).
The current code handles buffers and not real duration. This means that in many cases, I might stop muxing the audio even though I shouldn't, and the output video file will have a shorter audio content , compared to the original, even though the video is long enough.
I tried to make the MediaExtractor of the audio to go to its beginning each time it reached the end, by using:
if (audioBufferInfo.size < 0) {
Log.d("AppLog", "reached end of audio, looping...")
audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, 0)
For checking the types of the files, I tried using MediaMetadataRetriever
and then checking the mime-type. I think the supported ones are available on the docs (here) as those marked with "Encoder". Not sure about this. I also don't know which mime type is of which type that is mentioned there.
I also tried to re-initialize all that's related to the audio, but it didn't work either.
Here's my current code for the muxing itself (full sample project available here) :
object VideoAndAudioMuxer {
// based on: https://stackoverflow.com/a/31591485/878126
fun joinVideoAndAudio(videoFile: File, audioFile: File, outputFile: File): Boolean {
try {
// val videoMediaMetadataRetriever = MediaMetadataRetriever()
// videoMediaMetadataRetriever.setDataSource(videoFile.absolutePath)
// val videoDurationInMs =
// videoMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION).toLong()
// val videoMimeType =
// videoMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_MIMETYPE)
// val audioMediaMetadataRetriever = MediaMetadataRetriever()
// audioMediaMetadataRetriever.setDataSource(audioFile.absolutePath)
// val audioDurationInMs =
// audioMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION).toLong()
// val audioMimeType =
// audioMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_MIMETYPE)
// Log.d(
// "AppLog",
// "videoDuration:$videoDurationInMs audioDuration:$audioDurationInMs videoMimeType:$videoMimeType audioMimeType:$audioMimeType"
// )
// videoMediaMetadataRetriever.release()
// audioMediaMetadataRetriever.release()
val muxer = MediaMuxer(outputFile.absolutePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
val sampleSize = 256 * 1024
val videoExtractor = MediaExtractor()
videoExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
val videoFormat = videoExtractor.getTrackFormat(0)
val videoTrack = muxer.addTrack(videoFormat)
val videoBuf = ByteBuffer.allocate(sampleSize)
val videoBufferInfo = MediaCodec.BufferInfo()
// Log.d("AppLog", "Video Format $videoFormat")
val audioExtractor = MediaExtractor()
audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
val audioFormat = audioExtractor.getTrackFormat(0)
val audioTrack = muxer.addTrack(audioFormat)
val audioBuf = ByteBuffer.allocate(sampleSize)
val audioBufferInfo = MediaCodec.BufferInfo()
// Log.d("AppLog", "Audio Format $audioFormat")
// Log.d("AppLog", "muxing video&audio...")
// val minimalDurationInMs = Math.min(videoDurationInMs, audioDurationInMs)
while (true) {
videoBufferInfo.size = videoExtractor.readSampleData(videoBuf, 0)
audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, 0)
if (audioBufferInfo.size < 0) {
// Log.d("AppLog", "reached end of audio, looping...")
//TODO somehow start from beginning of the audio again, for looping till the video ends
// audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
// audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, 0)
if (videoBufferInfo.size < 0 || audioBufferInfo.size < 0) {
// Log.d("AppLog", "reached end of video")
videoBufferInfo.size = 0
audioBufferInfo.size = 0
} else {
// val donePercentage = videoExtractor.sampleTime / minimalDurationInMs / 10L
// Log.d("AppLog", "$donePercentage")
// video muxing
videoBufferInfo.presentationTimeUs = videoExtractor.sampleTime
videoBufferInfo.flags = videoExtractor.sampleFlags
muxer.writeSampleData(videoTrack, videoBuf, videoBufferInfo)
// audio muxing
audioBufferInfo.presentationTimeUs = audioExtractor.sampleTime
audioBufferInfo.flags = audioExtractor.sampleFlags
muxer.writeSampleData(audioTrack, audioBuf, audioBufferInfo)
// Log.d("AppLog", "success")
return true
} catch (e: Exception) {
// Log.d("AppLog", "Error " + e.message)
return false
How can I mux the video&audio files so that the audio will loop in case the audio is shorter (in duration) compared to the video?
How can I do it so that the audio will get cut precisely when the video ends (no remainders on either video&audio) ?
How can I check before calling this function, if the current device can handle the given input files and actually mux them ? Is there a way to check during runtime, which are supported for this kind of operation, instead of relying on a list on the docs that might change in the future?
Use the merge videos app to stitch videos together. Use the video joiner to stitch together different video clips and images and trim each one as needed. Instantly create video content to share on your YouTube channel, TikTok page, website, and countless other destinations.
By using Video Merge you can: - Video Merge Side by Side: Select two videos and the videos will merge in side by side style. - Video Merge Up Down: Select two videos and the videos will merge in up down style. - Video Merge Sequentially: Select two videos and the videos will merge one after one style.
I hava the same scene.
1: When audioBufferInfo.size
< 0, seek to start. But remember, you need accumulate presentationTimeUs
2: Get video duration, when audio loop to the duration (use presentationTimeUs
too), cut.
3: The audio file need to be MediaFormat.MIMETYPE_AUDIO_AMR_NB
. On my testing machines, it worked fine.
Here is the code:
private fun muxing(musicName: String) {
val saveFile = File(DirUtils.getPublicMediaPath(), "$saveName.mp4")
if (saveFile.exists()) {
try {
// get the video file duration in microseconds
val duration = getVideoDuration(mSaveFile!!.absolutePath)
val videoExtractor = MediaExtractor()
val audioExtractor = MediaExtractor()
val afdd = MucangConfig.getContext().assets.openFd(musicName)
audioExtractor.setDataSource(afdd.fileDescriptor, afdd.startOffset, afdd.length)
val muxer = MediaMuxer(saveFile.absolutePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
val videoFormat = videoExtractor.getTrackFormat(0)
val videoTrack = muxer.addTrack(videoFormat)
val audioFormat = audioExtractor.getTrackFormat(0)
val audioTrack = muxer.addTrack(audioFormat)
var sawEOS = false
val offset = 100
val sampleSize = 1000 * 1024
val videoBuf = ByteBuffer.allocate(sampleSize)
val audioBuf = ByteBuffer.allocate(sampleSize)
val videoBufferInfo = MediaCodec.BufferInfo()
val audioBufferInfo = MediaCodec.BufferInfo()
videoExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
val frameRate = videoFormat.getInteger(MediaFormat.KEY_FRAME_RATE)
val videoSampleTime = 1000 * 1000 / frameRate
while (!sawEOS) {
videoBufferInfo.offset = offset
videoBufferInfo.size = videoExtractor.readSampleData(videoBuf, offset)
if (videoBufferInfo.size < 0) {
sawEOS = true
videoBufferInfo.size = 0
} else {
videoBufferInfo.presentationTimeUs += videoSampleTime
videoBufferInfo.flags = videoExtractor.sampleFlags
muxer.writeSampleData(videoTrack, videoBuf, videoBufferInfo)
var sawEOS2 = false
var sampleTime = 0L
while (!sawEOS2) {
audioBufferInfo.offset = offset
audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, offset)
if (audioBufferInfo.presentationTimeUs >= duration) {
sawEOS2 = true
audioBufferInfo.size = 0
} else {
if (audioBufferInfo.size < 0) {
sampleTime = audioBufferInfo.presentationTimeUs
audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
audioBufferInfo.presentationTimeUs = audioExtractor.sampleTime + sampleTime
audioBufferInfo.flags = audioExtractor.sampleFlags
muxer.writeSampleData(audioTrack, audioBuf, audioBufferInfo)
} catch (e: Exception) {
LogUtils.e(TAG, "Mixer Error:" + e.message)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With