Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I encode and segment audio files without having gaps (or audio pops) between segments when I reconstruct it?

I'm working on a web application that requires streaming and synchronization of multiple audio files. For this, I am using the Web Audio API over HTML5 audio tags because of the importance of timing audio.

Currently, I'm using FFMPEG's segmentation feature to encode and segment the audio files into smaller chunks. The reason I am segmenting them is so I can start streaming from the middle of the file instead of starting from the beginning (otherwise I would've just split the files using UNIX split, as shown here. The problem is that when I string the audio segments back together, I get an audio pop between segments.

If I encode the segments using a PCM encoding (pcm_s24le) in a .wav file, the playback is seamless, which leads me to believe that the encoder is padding either the beginning or the end of the file. Since I will be dealing with many different audio files, using .wav would require far too much bandwidth.

I'm looking to one of the following solutions to the problem:

  • How can I segment encoded audio files seamlessly,
  • How can I force an encoder to NOT pad audio frames using ffmpeg (or another utility), or
  • What is a better way to stream audio (starting at an arbitrary track time) without using an audio tag?

System Information

  • Custom node.js server
  • Upon upload of an audio file, node.js pipes the data into ffmpeg's encoder
  • Need to use HTML5 Web Audio API supported encoding
  • Server sends audio chunks 1 at a time through a WebSockets socket

Thanks in advance. I've tried to be as clear as possible but if you need clarification I'd be more than willing to provide it.

like image 426
fenduru Avatar asked Feb 13 '13 03:02

fenduru


1 Answers

Since PCM is uncompressed format, seamless playback is expected. There is nothing that could create a glitch. Same thing would happen if you use some lossless codec like flac. On the other hand if you use any lossy codec like mp3, wma, etc... there is no way to avoid glitches without any interventions. WMA decoder for example will always give you more PCM than you initially provided while you were encoding. That extra bytes will produce a glitch and it will also screw up the duration. Also, such concatenated playback (cutlist) will have longer duration then it should. You can try to smooth the glitch with some DSP filtering. You can even try some simple actions like crossfading the transitions, etc.. Perhaps it would give some useful results.

If some lossless codec is not acceptable because of bandwidth, another approach would be to create a compressed files with lossy codec like mp3 and start streaming from the calculated position. Of course, you cannot have accurate seek on sample like in PCM and you will get a small amount of useless PCM while decoding because you will start decoding compressed data in the middle with no "previous data" required by the decoder. I would suggest constant bitrate while encoding such files, because you will be able to compute more accurate the seek position in the compressed file before you start streaming.

Regarding glitches here, if you start encoding such mp3 files and you create these files WITHOUT stopping the encoder then there will be no glitch while switching files because you simply divided compressed data in more files. Of course, you will probably have to implement this on your own.

like image 118
user1764961 Avatar answered Oct 20 '22 11:10

user1764961