I'm working on a web application that requires streaming and synchronization of multiple audio files. For this, I am using the Web Audio API over HTML5 audio tags because of the importance of timing audio.
Currently, I'm using FFMPEG's segmentation feature to encode and segment the audio files into smaller chunks. The reason I am segmenting them is so I can start streaming from the middle of the file instead of starting from the beginning (otherwise I would've just split the files using UNIX split, as shown here. The problem is that when I string the audio segments back together, I get an audio pop between segments.
If I encode the segments using a PCM encoding (pcm_s24le) in a .wav file, the playback is seamless, which leads me to believe that the encoder is padding either the beginning or the end of the file. Since I will be dealing with many different audio files, using .wav would require far too much bandwidth.
I'm looking to one of the following solutions to the problem:
Thanks in advance. I've tried to be as clear as possible but if you need clarification I'd be more than willing to provide it.
Since PCM is uncompressed format, seamless playback is expected. There is nothing that could create a glitch. Same thing would happen if you use some lossless codec like flac. On the other hand if you use any lossy codec like mp3, wma, etc... there is no way to avoid glitches without any interventions. WMA decoder for example will always give you more PCM than you initially provided while you were encoding. That extra bytes will produce a glitch and it will also screw up the duration. Also, such concatenated playback (cutlist) will have longer duration then it should. You can try to smooth the glitch with some DSP filtering. You can even try some simple actions like crossfading the transitions, etc.. Perhaps it would give some useful results.
If some lossless codec is not acceptable because of bandwidth, another approach would be to create a compressed files with lossy codec like mp3 and start streaming from the calculated position. Of course, you cannot have accurate seek on sample like in PCM and you will get a small amount of useless PCM while decoding because you will start decoding compressed data in the middle with no "previous data" required by the decoder. I would suggest constant bitrate while encoding such files, because you will be able to compute more accurate the seek position in the compressed file before you start streaming.
Regarding glitches here, if you start encoding such mp3 files and you create these files WITHOUT stopping the encoder then there will be no glitch while switching files because you simply divided compressed data in more files. Of course, you will probably have to implement this on your own.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With