Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to mux (merge) video&audio, so that the audio will loop in the output video in case it's too short in duration?

Background

I'm required to merge a video file and an audio file to a single video file, so that:

  1. The output video file will of the same duration as the input video file
  2. The audio in the output file will only be of the input audio file. If it's too short, it will loop to the end (can stop in the end if needed). This means that once the audio has finished playing while the video hasn't , I should play it again and again, till the video ends (concatenation of the audio).

The technical term of this merging operation is called "muxing", as I've read.

As an example, suppose we have an input video of 10 seconds, and an audio file of 4 seconds, the output video would be of 10 seconds (always the same as the input video), and the audio will play 2.5 times (first 2 cover the first 8 seconds, and then 2 seconds out of 4 for the rest) .

The problems

While I have found a solution of how to mux a video and an audio (here), I've come across multiple issues:

  1. I can't figure out how to loop the writing of the audio content when needed. It keeps giving me an error, no matter what I try

  2. The input files must be of specific file formats. Otherwise, it might throw an exception, or (in very rare cases) worse: create a video file that has black content. Even more: Sometimes a '.mkv' file (for example) could be fine, and sometimes it won't be accepted (and both can be played on a video player app).

  3. The current code handles buffers and not real duration. This means that in many cases, I might stop muxing the audio even though I shouldn't, and the output video file will have a shorter audio content , compared to the original, even though the video is long enough.

What I've tried

  • I tried to make the MediaExtractor of the audio to go to its beginning each time it reached the end, by using:

            if (audioBufferInfo.size < 0) {
                Log.d("AppLog", "reached end of audio, looping...")
                audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
                audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, 0)
            }
    
  • For checking the types of the files, I tried using MediaMetadataRetriever and then checking the mime-type. I think the supported ones are available on the docs (here) as those marked with "Encoder". Not sure about this. I also don't know which mime type is of which type that is mentioned there.

  • I also tried to re-initialize all that's related to the audio, but it didn't work either.

Here's my current code for the muxing itself (full sample project available here) :

object VideoAndAudioMuxer {
    //   based on:  https://stackoverflow.com/a/31591485/878126
    @WorkerThread
    fun joinVideoAndAudio(videoFile: File, audioFile: File, outputFile: File): Boolean {
        try {
            //            val videoMediaMetadataRetriever = MediaMetadataRetriever()
            //            videoMediaMetadataRetriever.setDataSource(videoFile.absolutePath)
            //            val videoDurationInMs =
            //                videoMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION).toLong()
            //            val videoMimeType =
            //                videoMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_MIMETYPE)
            //            val audioMediaMetadataRetriever = MediaMetadataRetriever()
            //            audioMediaMetadataRetriever.setDataSource(audioFile.absolutePath)
            //            val audioDurationInMs =
            //                audioMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION).toLong()
            //            val audioMimeType =
            //                audioMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_MIMETYPE)
            //            Log.d(
            //                "AppLog",
            //                "videoDuration:$videoDurationInMs audioDuration:$audioDurationInMs videoMimeType:$videoMimeType audioMimeType:$audioMimeType"
            //            )
            //            videoMediaMetadataRetriever.release()
            //            audioMediaMetadataRetriever.release()
            outputFile.delete()
            outputFile.createNewFile()
            val muxer = MediaMuxer(outputFile.absolutePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
            val sampleSize = 256 * 1024
            //video
            val videoExtractor = MediaExtractor()
            videoExtractor.setDataSource(videoFile.absolutePath)
            videoExtractor.selectTrack(0)
            videoExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
            val videoFormat = videoExtractor.getTrackFormat(0)
            val videoTrack = muxer.addTrack(videoFormat)
            val videoBuf = ByteBuffer.allocate(sampleSize)
            val videoBufferInfo = MediaCodec.BufferInfo()
//            Log.d("AppLog", "Video Format $videoFormat")
            //audio
            val audioExtractor = MediaExtractor()
            audioExtractor.setDataSource(audioFile.absolutePath)
            audioExtractor.selectTrack(0)
            audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
            val audioFormat = audioExtractor.getTrackFormat(0)
            val audioTrack = muxer.addTrack(audioFormat)
            val audioBuf = ByteBuffer.allocate(sampleSize)
            val audioBufferInfo = MediaCodec.BufferInfo()
//            Log.d("AppLog", "Audio Format $audioFormat")
            //
            muxer.start()
//            Log.d("AppLog", "muxing video&audio...")
            //            val minimalDurationInMs = Math.min(videoDurationInMs, audioDurationInMs)
            while (true) {
                videoBufferInfo.size = videoExtractor.readSampleData(videoBuf, 0)
                audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, 0)
                if (audioBufferInfo.size < 0) {
                    //                    Log.d("AppLog", "reached end of audio, looping...")
                    //TODO somehow start from beginning of the audio again, for looping till the video ends
                    //                    audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
                    //                    audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, 0)
                }
                if (videoBufferInfo.size < 0 || audioBufferInfo.size < 0) {
//                    Log.d("AppLog", "reached end of video")
                    videoBufferInfo.size = 0
                    audioBufferInfo.size = 0
                    break
                } else {
                    //                    val donePercentage = videoExtractor.sampleTime / minimalDurationInMs / 10L
                    //                    Log.d("AppLog", "$donePercentage")
                    // video muxing
                    videoBufferInfo.presentationTimeUs = videoExtractor.sampleTime
                    videoBufferInfo.flags = videoExtractor.sampleFlags
                    muxer.writeSampleData(videoTrack, videoBuf, videoBufferInfo)
                    videoExtractor.advance()
                    // audio muxing
                    audioBufferInfo.presentationTimeUs = audioExtractor.sampleTime
                    audioBufferInfo.flags = audioExtractor.sampleFlags
                    muxer.writeSampleData(audioTrack, audioBuf, audioBufferInfo)
                    audioExtractor.advance()
                }
            }
            muxer.stop()
            muxer.release()
//            Log.d("AppLog", "success")
            return true
        } catch (e: Exception) {
            e.printStackTrace()
//            Log.d("AppLog", "Error " + e.message)
        }
        return false
    }
}
  • I've also tried to use FFMPEG libary (here and here) , to see how to do it. It worked fine, but it has some possible issues : The library seems to take a lot of space, annoying licensing terms, and for some reason the sample couldn't play the output file that I got to create, unless I remove something in the command that will make the conversion much slower. I would really prefer to use the built in API than to use this library, even though it's a very powerful library... Also, it seems that for some input files, it didn't loop...

The questions

  1. How can I mux the video&audio files so that the audio will loop in case the audio is shorter (in duration) compared to the video?

  2. How can I do it so that the audio will get cut precisely when the video ends (no remainders on either video&audio) ?

  3. How can I check before calling this function, if the current device can handle the given input files and actually mux them ? Is there a way to check during runtime, which are supported for this kind of operation, instead of relying on a list on the docs that might change in the future?

like image 797
android developer Avatar asked Feb 19 '19 15:02

android developer


People also ask

Can you blend 2 videos together?

Use the merge videos app to stitch videos together. Use the video joiner to stitch together different video clips and images and trim each one as needed. Instantly create video content to share on your YouTube channel, TikTok page, website, and countless other destinations.

How do I merge two videos in parallel?

By using Video Merge you can: - Video Merge Side by Side: Select two videos and the videos will merge in side by side style. - Video Merge Up Down: Select two videos and the videos will merge in up down style. - Video Merge Sequentially: Select two videos and the videos will merge one after one style.


1 Answers

I hava the same scene.

  • 1: When audioBufferInfo.size < 0, seek to start. But remember, you need accumulate presentationTimeUs.

  • 2: Get video duration, when audio loop to the duration (use presentationTimeUs too), cut.

  • 3: The audio file need to be MediaFormat.MIMETYPE_AUDIO_AMR_NB or MediaFormat.MIMETYPE_AUDIO_AMR_WB or MediaFormat.MIMETYPE_AUDIO_AAC. On my testing machines, it worked fine.

Here is the code:

private fun muxing(musicName: String) {
    val saveFile = File(DirUtils.getPublicMediaPath(), "$saveName.mp4")
    if (saveFile.exists()) {
        saveFile.delete()
        PhotoHelper.sendMediaScannerBroadcast(saveFile)
    }
    try {
        // get the video file duration in microseconds
        val duration = getVideoDuration(mSaveFile!!.absolutePath)

        saveFile.createNewFile()

        val videoExtractor = MediaExtractor()
        videoExtractor.setDataSource(mSaveFile!!.absolutePath)

        val audioExtractor = MediaExtractor()
        val afdd = MucangConfig.getContext().assets.openFd(musicName)
        audioExtractor.setDataSource(afdd.fileDescriptor, afdd.startOffset, afdd.length)

        val muxer = MediaMuxer(saveFile.absolutePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)

        videoExtractor.selectTrack(0)
        val videoFormat = videoExtractor.getTrackFormat(0)
        val videoTrack = muxer.addTrack(videoFormat)

        audioExtractor.selectTrack(0)
        val audioFormat = audioExtractor.getTrackFormat(0)
        val audioTrack = muxer.addTrack(audioFormat)

        var sawEOS = false
        val offset = 100
        val sampleSize = 1000 * 1024
        val videoBuf = ByteBuffer.allocate(sampleSize)
        val audioBuf = ByteBuffer.allocate(sampleSize)
        val videoBufferInfo = MediaCodec.BufferInfo()
        val audioBufferInfo = MediaCodec.BufferInfo()

        videoExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
        audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)

        muxer.start()

        val frameRate = videoFormat.getInteger(MediaFormat.KEY_FRAME_RATE)
        val videoSampleTime = 1000 * 1000 / frameRate

        while (!sawEOS) {
            videoBufferInfo.offset = offset
            videoBufferInfo.size = videoExtractor.readSampleData(videoBuf, offset)

            if (videoBufferInfo.size < 0) {
                sawEOS = true
                videoBufferInfo.size = 0

            } else {
                videoBufferInfo.presentationTimeUs += videoSampleTime
                videoBufferInfo.flags = videoExtractor.sampleFlags
                muxer.writeSampleData(videoTrack, videoBuf, videoBufferInfo)
                videoExtractor.advance()
            }
        }

        var sawEOS2 = false
        var sampleTime = 0L
        while (!sawEOS2) {

            audioBufferInfo.offset = offset
            audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, offset)

            if (audioBufferInfo.presentationTimeUs >= duration) {
                sawEOS2 = true
                audioBufferInfo.size = 0
            } else {
                if (audioBufferInfo.size < 0) {
                    sampleTime = audioBufferInfo.presentationTimeUs
                    audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
                    continue
                }
            }
            audioBufferInfo.presentationTimeUs = audioExtractor.sampleTime + sampleTime
            audioBufferInfo.flags = audioExtractor.sampleFlags
            muxer.writeSampleData(audioTrack, audioBuf, audioBufferInfo)
            audioExtractor.advance()
        }

        muxer.stop()
        muxer.release()
        videoExtractor.release()
        audioExtractor.release()
        afdd.close()
    } catch (e: Exception) {
        LogUtils.e(TAG, "Mixer Error:" + e.message)
    }
}
like image 177
lijia Avatar answered Oct 15 '22 23:10

lijia