I want to represent an audio file in an image with a maximum size of 180×180 pixels. I want to generate this image so that it somehow gives a representation of the audio file, think of it like SoundCloud's waveform (amplitude graph)?. <img src="https://i.imgur.com/5j4Tq.png" alt="Screenshot of Soundcloud's player"> I wonder if any of you have something for this. I have been searching around for a bit, mainly "audio visualization" and "audio thumbnailing", but I have not found anything useful. I first posted this to ux.stackexchange.com, this is my attempt to reach any programmers working on this.

You could also break up the audio into a chunks and measure the RMS (a measure of loudness). let's say you want an image that is 180 pixels wide. I'll use pydub, a light-weight wrapper I wrote around the std lib <code>wave</code> modeule: <pre class="prettyprint"><code>from pydub import AudioSegment # first I'll open the audio file sound = AudioSegment.from_mp3("some_song.mp3") # break the sound 180 even chunks (or however # many pixels wide the image should be) chunk_length = len(sound) / 180 loudness_of_chunks = [] for i in range(180): start = i * chunk_length end = chunk_start + chunk_length chunk = sound[start:end] loudness_of_chunks.append(chunk.rms) </code></pre> the for loop can be represented as the following list comprehension, I just wanted it to be clear: <pre class="prettyprint"><code>loudness_of_chunks = [ sound[ i*chunk_length : (i+1)*chunk_length ].rms for i in range(180)] </code></pre> Now the only think left to do is scale the RMS down to a 0 - 180 scale (since you want the image to be 180px tall) <pre class="prettyprint"><code>max_rms = max(loudness_of_chunks) scaled_loudness = [ (loudness / max_rms) * 180 for loudness in loudness_of_chunks] </code></pre> I'll leave the drawing of the actual pixels to you, I'm not very experienced with PIL or ImageMagik :/

Generate thumbnail for arbitrary audio file

2 Answers

You could also break up the audio into a chunks and measure the RMS (a measure of loudness). let's say you want an image that is 180 pixels wide.

I'll use pydub, a light-weight wrapper I wrote around the std lib wave modeule:

from pydub import AudioSegment

# first I'll open the audio file
sound = AudioSegment.from_mp3("some_song.mp3")

# break the sound 180 even chunks (or however
# many pixels wide the image should be)
chunk_length = len(sound) / 180

loudness_of_chunks = []
for i in range(180):
    start = i * chunk_length
    end = chunk_start + chunk_length

    chunk = sound[start:end]
    loudness_of_chunks.append(chunk.rms)

the for loop can be represented as the following list comprehension, I just wanted it to be clear:

loudness_of_chunks = [
    sound[ i*chunk_length : (i+1)*chunk_length ].rms
    for i in range(180)]

Now the only think left to do is scale the RMS down to a 0 - 180 scale (since you want the image to be 180px tall)

max_rms = max(loudness_of_chunks)

scaled_loudness = [ (loudness / max_rms) * 180 for loudness in loudness_of_chunks]

I'll leave the drawing of the actual pixels to you, I'm not very experienced with PIL or ImageMagik :/

172

answered Oct 14 '22 14:10

Jiaaro

Based on Jiaaro's answer (thanks for writing pydub!), and built for web2py here's my two cents:

def generate_waveform():
    img_width = 1170
    img_height = 140
    line_color = 180
    filename = os.path.join(request.folder,'static','sounds','adg3.mp3')


    # first I'll open the audio file
    sound = pydub.AudioSegment.from_mp3(filename)

    # break the sound 180 even chunks (or however
    # many pixels wide the image should be)
    chunk_length = len(sound) / img_width

    loudness_of_chunks = [
        sound[ i*chunk_length : (i+1)*chunk_length ].rms
        for i in range(img_width)
    ]
    max_rms = float(max(loudness_of_chunks))
    scaled_loudness = [ round(loudness * img_height/ max_rms)  for loudness in loudness_of_chunks]

    # now convert the scaled_loudness to an image
    im = Image.new('L',(img_width, img_height),color=255)
    draw = ImageDraw.Draw(im)
    for x,rms in enumerate(scaled_loudness):
        y0 = img_height - rms
        y1 = img_height
        draw.line((x,y0,x,y1), fill=line_color, width=1)
    buffer = cStringIO.StringIO()
    del draw
    im = im.filter(ImageFilter.SMOOTH).filter(ImageFilter.DETAIL)
    im.save(buffer,'PNG')
    buffer.seek(0)
    return response.stream(buffer, filename=filename+'.png')

answered Oct 14 '22 14:10

Remco

Related questions
                            
                                Being a good citizen and web-scraping
                            
                                Python: pick appropriate datatype size (int) automatically
                            
                                Flask AttributeError: 'NoneType' object has no attribute 'request'
                            
                                Can't append_entry FieldList in Flask-wtf more than once
                            
                                Setting an initial width of a pyqt widget in a splitter
                            
                                Maximum rectangle algorithm implementation
                            
                                Using virtualenv in Pycharm as Django IDE
                            
                                Qt - Format QColor to be used in Style Sheet?
                            
                                Sphinx - autodata shows str.__doc__
                            
                                URLFetch behind a Proxy Server on App Engine Production
                            
                                run_gunicorn works but not gunicorn_django despite both within the same environment, can't see registration module
                            
                                Using sphinx autodoc for a fabfile
                            
                                How do I use multiple .mo files simultaneously for gettext translation?
                            
                                More advanced syntax coloring in emacs for Python
                            
                                python idastar vs astar solving 8 puzzle
                            
                                networks with random power-law distributed weights
                            
                                How to create custom error pages with app.yaml for Google Appengine Python
                            
                                Is it possible to install the wkhtmltopdf Python package on Windows?
                            
                                AJAX Submission Form using Bottle (Python)
                            
                                Emacs python-mode: Keyboard shortcuts for pdb step-by-step debugging

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Generate thumbnail for arbitrary audio file

Tags:

python

visualization

audio

joar

People also ask

2 Answers

Jiaaro

Remco

Recent Activity

Donate For Us