Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pydub audio glitches when splitting/joining mp3

I'm experimenting with pydub, which I like very much, however I am having a problem when splitting/joining an mp3 file.

I need to generate a series of small snippets of audio on the server, which will be sent in sequence to a web browser and played via an <audio/> element. I need the audio playback to be 'seamless' with no audible joins between the separate pieces. At the moment however, the joins between the separate bits of audio are quite obvious, sometimes there is a short silence and sometimes a strange audio glitch.

In my proof of concept code I have taken a single large mp3 and split it into 1-second chunks as follows:

song = AudioSegment.from_mp3('my.mp3')
song_pos = 0
while song_pos < 100:
    p1 = song_pos * 1000
    p2 = p1 + 1000

    segment = song[p1:p2] # 1 second of audio

    output = StringIO.StringIO()
    segment.export(output, format="mp3")
    client_data = output.getvalue() # send this to client

    song_pos += 1

The client_data values are streamed to the browser over a long-lived http connection:

socket.send("HTTP/1.1 200 OK\r\nConnection: Keep-Alive\r\nContent-Type: audio/mp3\r\n\r\n")

and then for each new chunk of audio

socket.send(client_data)

Can anyone explain the glitches that I am hearing, and suggest a way to eliminate them?

like image 496
codebox Avatar asked Oct 18 '22 16:10

codebox


1 Answers

Upgrading my comment to an answer:

The primary issue is that MP3 codecs used by ffmpeg add silence to the end of the encoded audio (and your approach is producing multiple individual audio files).

If possible, use a lossless format like wave and then reduce the file size with gzip or similar. You may also be able to use lossless audio compression (for example, flac) but it probably depends on how the encoder works.

I don't have a conclusive explanation for the audible artifacts you're hearing, but it could be that you're splitting the audio at a point where the signal is non-zero. If a sound begins with a sample with a value of 100 (for example), that would sound like a digital popping sound. The MP3 compression may also alter the sound though, especially at lower bit rates. If this is the issue, a 1ms fade in will eliminate the pop without a noticeable audible "fade" (though potentially introduce other artifacts) - a longer fade in (like 20 or 50 ms would avoid strange frequency domain artifacts but would introduce noticeable a "fade in".

If you're willing to do a little more (coding) work, you can search for a "zero crossing" (basically, a place where the signal is at a zero point naturally) and split the audio there.

Probably the best approach if it's possible:

Encode the entire signal as a single, compressed file, and send the bytes (of that one file) down to the client in chunks for playback as a single stream. If you use constant bitrate mp3 encoding (CBR) you can send almost perfectly 1 second long chunks just by counting bytes. e.g., with 256kbps CBR, just send 256 KB at a time.

like image 66
Jiaaro Avatar answered Oct 20 '22 10:10

Jiaaro