I'm trying to normalize an audio file of speech. Specifically, where an audio file contains peaks in volume, I'm trying to level it out, so the quiet sections are louder, and the peaks are quieter. I know very little about audio manipulation, beyond what I've learnt from working on this task. Also, my math is embarrassingly weak. I've done some research, and the Xuggle site provides a sample which shows reducing the volume using the following code: (full version here) <pre class="prettyprint"><code>@Override public void onAudioSamples(IAudioSamplesEvent event) { // get the raw audio byes and adjust it's value ShortBuffer buffer = event.getAudioSamples().getByteBuffer().asShortBuffer(); for (int i = 0; i < buffer.limit(); ++i) buffer.put(i, (short)(buffer.get(i) * mVolume)); super.onAudioSamples(event); } </code></pre> Here, they modify the bytes in <code>getAudioSamples()</code> by a constant of <code>mVolume</code>. Building on this approach, I've attempted a normalisation modifies the bytes in <code>getAudioSamples()</code> to a normalised value, considering the max/min in the file. (See below for details). I have a simple filter to leave "silence" alone (ie., anything below a value). I'm finding that the output file is very noisy (ie., the quality is seriously degraded). I assume that the error is either in my normalisation algorithim, or the way I manipulate the bytes. However, I'm unsure of where to go next. Here's an abridged version of what I'm currently doing. <h3>Step 1: Find peaks in file:</h3> Reads the full audio file, and finds this highest and lowest values of <code>buffer.get()</code> for all AudioSamples <pre class="prettyprint"><code> @Override public void onAudioSamples(IAudioSamplesEvent event) { IAudioSamples audioSamples = event.getAudioSamples(); ShortBuffer buffer = audioSamples.getByteBuffer().asShortBuffer(); short min = Short.MAX_VALUE; short max = Short.MIN_VALUE; for (int i = 0; i < buffer.limit(); ++i) { short value = buffer.get(i); min = (short) Math.min(min, value); max = (short) Math.max(max, value); } // assign of min/max ommitted for brevity. super.onAudioSamples(event); } </code></pre> <h3>Step 2: Normalize all values:</h3> In a loop similar to step1, replace the buffer with normalized values, calling: <pre class="prettyprint"><code> buffer.put(i, normalize(buffer.get(i)); public short normalize(short value) { if (isBackgroundNoise(value)) return value; short rawMin = // min from step1 short rawMax = // max from step1 short targetRangeMin = 1000; short targetRangeMax = 8000; int abs = Math.abs(value); double a = (abs - rawMin) * (targetRangeMax - targetRangeMin); double b = (rawMax - rawMin); double result = targetRangeMin + ( a/b ); // Copy the sign of value to result. result = Math.copySign(result,value); return (short) result; } </code></pre> <h3>Questions:</h3> <ul> <li>Is this a valid approach for attempting to normalize an audio file?</li> <li>Is my math in <code>normalize()</code> valid?</li> <li>Why would this cause the file to become noisy, where a similar approach in the demo code doesn't?</li> </ul>

I don't think the concept of "minimum sample value" is very meaningful, since the sample value just represents the current "height" of the sound wave at a certain time instant. I.e. its absolute value will vary between the peak value of the audio clip and zero. Thus, having a <code>targetRangeMin</code> seems to be wrong and will probably cause some distortion of the waveform. I think a better approach might be to have some sort of weight function that decreases the sample value based on its size. I.e. bigger values are decreased by a large percentage than smaller values. This would also introduce some distortion, but probably not very noticeable. Edit: here is a sample implementation of such a method: <pre class="prettyprint"><code>public short normalize(short value) { short rawMax = // max from step1 short targetMax = 8000; //This is the maximum volume reduction double maxReduce = 1 - targetMax/(double)rawMax; int abs = Math.abs(value); double factor = (maxReduce * abs/(double)rawMax); return (short) Math.round((1 - factor) * value); } </code></pre> For reference, this is what your algorithm did to a sine curve with an amplitude of 10000: <img src="https://i.stack.imgur.com/0Jsg9.png" alt="Original algorithm"> This explains why the audio quality becomes much worse after being normalized. This is the result after running with my suggested <code>normalize</code> method: <img src="https://i.stack.imgur.com/1gDgc.png" alt="Suggested algorithm">

Java algorithm for normalizing audio

Tags:

java

math

audio

I'm trying to normalize an audio file of speech.

Specifically, where an audio file contains peaks in volume, I'm trying to level it out, so the quiet sections are louder, and the peaks are quieter.

I know very little about audio manipulation, beyond what I've learnt from working on this task. Also, my math is embarrassingly weak.

I've done some research, and the Xuggle site provides a sample which shows reducing the volume using the following code: (full version here)

@Override
  public void onAudioSamples(IAudioSamplesEvent event)
{
  // get the raw audio byes and adjust it's value 

  ShortBuffer buffer = event.getAudioSamples().getByteBuffer().asShortBuffer();
  for (int i = 0; i < buffer.limit(); ++i)
    buffer.put(i, (short)(buffer.get(i) * mVolume));

  super.onAudioSamples(event);
}

Here, they modify the bytes in getAudioSamples() by a constant of mVolume.

Building on this approach, I've attempted a normalisation modifies the bytes in getAudioSamples() to a normalised value, considering the max/min in the file. (See below for details). I have a simple filter to leave "silence" alone (ie., anything below a value).

I'm finding that the output file is very noisy (ie., the quality is seriously degraded). I assume that the error is either in my normalisation algorithim, or the way I manipulate the bytes. However, I'm unsure of where to go next.

Here's an abridged version of what I'm currently doing.

Step 1: Find peaks in file:

Reads the full audio file, and finds this highest and lowest values of buffer.get() for all AudioSamples

    @Override
    public void onAudioSamples(IAudioSamplesEvent event) {
        IAudioSamples audioSamples = event.getAudioSamples();
        ShortBuffer buffer = 
           audioSamples.getByteBuffer().asShortBuffer();

        short min = Short.MAX_VALUE;
        short max = Short.MIN_VALUE;
        for (int i = 0; i < buffer.limit(); ++i) {
            short value = buffer.get(i);
            min = (short) Math.min(min, value);
            max = (short) Math.max(max, value);
        }
        // assign of min/max ommitted for brevity.
        super.onAudioSamples(event);

    }

Step 2: Normalize all values:

In a loop similar to step1, replace the buffer with normalized values, calling:

    buffer.put(i, normalize(buffer.get(i));

public short normalize(short value) {
    if (isBackgroundNoise(value))
        return value;

    short rawMin = // min from step1
    short rawMax = // max from step1
    short targetRangeMin = 1000;
    short targetRangeMax = 8000;

    int abs = Math.abs(value);
    double a = (abs - rawMin) * (targetRangeMax - targetRangeMin);
    double b = (rawMax - rawMin);
    double result = targetRangeMin + ( a/b );

     // Copy the sign of value to result.
    result = Math.copySign(result,value);
    return (short) result;
}

Questions:

Is this a valid approach for attempting to normalize an audio file?
Is my math in normalize() valid?
Why would this cause the file to become noisy, where a similar approach in the demo code doesn't?

858

asked Sep 18 '12 01:09

Marty Pitt

1 Answers

I don't think the concept of "minimum sample value" is very meaningful, since the sample value just represents the current "height" of the sound wave at a certain time instant. I.e. its absolute value will vary between the peak value of the audio clip and zero. Thus, having a targetRangeMin seems to be wrong and will probably cause some distortion of the waveform.

I think a better approach might be to have some sort of weight function that decreases the sample value based on its size. I.e. bigger values are decreased by a large percentage than smaller values. This would also introduce some distortion, but probably not very noticeable.

Edit: here is a sample implementation of such a method:

public short normalize(short value) {
    short rawMax = // max from step1
    short targetMax = 8000;

    //This is the maximum volume reduction
    double maxReduce = 1 - targetMax/(double)rawMax;

    int abs = Math.abs(value);
    double factor = (maxReduce * abs/(double)rawMax);

    return (short) Math.round((1 - factor) * value); 
}

For reference, this is what your algorithm did to a sine curve with an amplitude of 10000: Original algorithm

This explains why the audio quality becomes much worse after being normalized.

This is the result after running with my suggested normalize method: Suggested algorithm

114

answered Sep 29 '22 01:09

Petter

Related questions
                            
                                IntelliJ Idea under Linux, No such file or directory on main class
                            
                                Should I close the FileChannel?
                            
                                Simplified Bresenham's line algorithm: What does it *exactly* do?
                            
                                How to programmatically generate .class files?
                            
                                How can I measure thread stack depth?
                            
                                How to capture trayicon.displayMessage() mouse click on the tooltip baloon
                            
                                Should persistent classes initialize instance variable collections
                            
                                When a request is handled by a servlet, is the entire request header/body/etc. loaded already?
                            
                                How do I stop .mdmp files from being created
                            
                                How to intersect two sorted integer arrays without duplicates?
                            
                                Mysql Drop Table as PreparedStatement not working for me
                            
                                What is an isolated classloader in Java?
                            
                                Webstart runs with wrong version of JRE
                            
                                Large Enterprise Java Application - Modularization
                            
                                (Linked)BlockingQueue.put(null) throws NullPointerException
                            
                                How to unmarshal a SOAP Fault in a FaultMessageResolver?
                            
                                How is a skeletal implementation different from an ordinary abstract class?
                            
                                How to install mod_jk on Mac OS X
                            
                                Execution order of of static blocks in an Enum type w.r.t to constructor
                            
                                Why java applets/javafx aren't widely used? (why I shouldn't use them for RIA) [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With