I want to represent an audio file in an image with a maximum size of 180×180 pixels.
I want to generate this image so that it somehow gives a representation of the audio file, think of it like SoundCloud's waveform (amplitude graph)?.
I wonder if any of you have something for this. I have been searching around for a bit, mainly "audio visualization" and "audio thumbnailing", but I have not found anything useful.
I first posted this to ux.stackexchange.com, this is my attempt to reach any programmers working on this.
Select Video settings or Audio settings from the upper-right corner of the screen. Settings will open in a pane to the right of the file. In the Thumbnail field, select the Upload icon. Navigate to the thumbnail you want to upload and select it.
You could also break up the audio into a chunks and measure the RMS (a measure of loudness). let's say you want an image that is 180 pixels wide.
I'll use pydub, a light-weight wrapper I wrote around the std lib wave
modeule:
from pydub import AudioSegment
# first I'll open the audio file
sound = AudioSegment.from_mp3("some_song.mp3")
# break the sound 180 even chunks (or however
# many pixels wide the image should be)
chunk_length = len(sound) / 180
loudness_of_chunks = []
for i in range(180):
start = i * chunk_length
end = chunk_start + chunk_length
chunk = sound[start:end]
loudness_of_chunks.append(chunk.rms)
the for loop can be represented as the following list comprehension, I just wanted it to be clear:
loudness_of_chunks = [
sound[ i*chunk_length : (i+1)*chunk_length ].rms
for i in range(180)]
Now the only think left to do is scale the RMS down to a 0 - 180 scale (since you want the image to be 180px tall)
max_rms = max(loudness_of_chunks)
scaled_loudness = [ (loudness / max_rms) * 180 for loudness in loudness_of_chunks]
I'll leave the drawing of the actual pixels to you, I'm not very experienced with PIL or ImageMagik :/
Based on Jiaaro's answer (thanks for writing pydub!), and built for web2py here's my two cents:
def generate_waveform():
img_width = 1170
img_height = 140
line_color = 180
filename = os.path.join(request.folder,'static','sounds','adg3.mp3')
# first I'll open the audio file
sound = pydub.AudioSegment.from_mp3(filename)
# break the sound 180 even chunks (or however
# many pixels wide the image should be)
chunk_length = len(sound) / img_width
loudness_of_chunks = [
sound[ i*chunk_length : (i+1)*chunk_length ].rms
for i in range(img_width)
]
max_rms = float(max(loudness_of_chunks))
scaled_loudness = [ round(loudness * img_height/ max_rms) for loudness in loudness_of_chunks]
# now convert the scaled_loudness to an image
im = Image.new('L',(img_width, img_height),color=255)
draw = ImageDraw.Draw(im)
for x,rms in enumerate(scaled_loudness):
y0 = img_height - rms
y1 = img_height
draw.line((x,y0,x,y1), fill=line_color, width=1)
buffer = cStringIO.StringIO()
del draw
im = im.filter(ImageFilter.SMOOTH).filter(ImageFilter.DETAIL)
im.save(buffer,'PNG')
buffer.seek(0)
return response.stream(buffer, filename=filename+'.png')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With