Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate thumbnail for arbitrary audio file

I want to represent an audio file in an image with a maximum size of 180×180 pixels.

I want to generate this image so that it somehow gives a representation of the audio file, think of it like SoundCloud's waveform (amplitude graph)?.

Screenshot of Soundcloud's player

I wonder if any of you have something for this. I have been searching around for a bit, mainly "audio visualization" and "audio thumbnailing", but I have not found anything useful.

I first posted this to ux.stackexchange.com, this is my attempt to reach any programmers working on this.

like image 267
joar Avatar asked Feb 08 '12 19:02

joar


People also ask

How do I make an audio file into a thumbnail?

Select Video settings or Audio settings from the upper-right corner of the screen. Settings will open in a pane to the right of the file. In the Thumbnail field, select the Upload icon. Navigate to the thumbnail you want to upload and select it.


2 Answers

You could also break up the audio into a chunks and measure the RMS (a measure of loudness). let's say you want an image that is 180 pixels wide.

I'll use pydub, a light-weight wrapper I wrote around the std lib wave modeule:

from pydub import AudioSegment

# first I'll open the audio file
sound = AudioSegment.from_mp3("some_song.mp3")

# break the sound 180 even chunks (or however
# many pixels wide the image should be)
chunk_length = len(sound) / 180

loudness_of_chunks = []
for i in range(180):
    start = i * chunk_length
    end = chunk_start + chunk_length

    chunk = sound[start:end]
    loudness_of_chunks.append(chunk.rms)

the for loop can be represented as the following list comprehension, I just wanted it to be clear:

loudness_of_chunks = [
    sound[ i*chunk_length : (i+1)*chunk_length ].rms
    for i in range(180)]

Now the only think left to do is scale the RMS down to a 0 - 180 scale (since you want the image to be 180px tall)

max_rms = max(loudness_of_chunks)

scaled_loudness = [ (loudness / max_rms) * 180 for loudness in loudness_of_chunks]

I'll leave the drawing of the actual pixels to you, I'm not very experienced with PIL or ImageMagik :/

like image 172
Jiaaro Avatar answered Oct 14 '22 14:10

Jiaaro


Based on Jiaaro's answer (thanks for writing pydub!), and built for web2py here's my two cents:

def generate_waveform():
    img_width = 1170
    img_height = 140
    line_color = 180
    filename = os.path.join(request.folder,'static','sounds','adg3.mp3')


    # first I'll open the audio file
    sound = pydub.AudioSegment.from_mp3(filename)

    # break the sound 180 even chunks (or however
    # many pixels wide the image should be)
    chunk_length = len(sound) / img_width

    loudness_of_chunks = [
        sound[ i*chunk_length : (i+1)*chunk_length ].rms
        for i in range(img_width)
    ]
    max_rms = float(max(loudness_of_chunks))
    scaled_loudness = [ round(loudness * img_height/ max_rms)  for loudness in loudness_of_chunks]

    # now convert the scaled_loudness to an image
    im = Image.new('L',(img_width, img_height),color=255)
    draw = ImageDraw.Draw(im)
    for x,rms in enumerate(scaled_loudness):
        y0 = img_height - rms
        y1 = img_height
        draw.line((x,y0,x,y1), fill=line_color, width=1)
    buffer = cStringIO.StringIO()
    del draw
    im = im.filter(ImageFilter.SMOOTH).filter(ImageFilter.DETAIL)
    im.save(buffer,'PNG')
    buffer.seek(0)
    return response.stream(buffer, filename=filename+'.png')
like image 1
Remco Avatar answered Oct 14 '22 14:10

Remco