Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to insert a silence of specific duration at an arbitrary position of a MP3 file?

Scenario

I have a bunch of MP3 files which some have a constant bit-rate, others have variable bitrate, some are encoded at 128 kbps, some at other bitrate, some are stereo and some are joint stereo. All are at 44,100 khz

In order to automate a task with these thousands of MP3 files, I'm trying to develop an algorithm that should insert a silence of an arbitrary duration into these MP3 files at different arbitrary positions / durations (eg. insert 500 ms of silence into one MP3 file at position 00:02:30, then insert 750 ms of silence into other MP3 file at position 00:40:02).

Research

The only info I found is about inserting silence at the start or at the end of an MP3 file. This is not what I want because I require to insert silence at an arbitrary position. Most of the times for most of the files I would require to add a silence near the middle of the MP3 file, and maybe very few times I would require to add it at the start of the MP3 file. I will not never need to add a silence at the end of the file.

Some suggests the usage of SOX or FFMPEG command-line applications to insert silence at the start or the end of a MP3 file. I don't know if these apps could serve me for my purpose, but anyways my objective is to do this with C# or VB.NET languages, not depending on any third party app, so this way I can have total control of what modifications I will be doing in the file, and programaticaly handle the resulting modified file to perform other tasks with it (because inserting a silence is just one of the things that I really need to do with these MP3 files).

But I approve depending on the usage of any external library, and I remembered NAudio for .NET, a great library for audio manipulation, and I found this interesting snippet which is not about inserting silence but concatenating files:

https://markheath.net/post/concatenating-sample-providers-in-naudio

I think with NAudio I will have a chance to develop an algorithm to insert silence at a specific duration.

Approaches

It's obvious I don't have enough knowledge to understand how can I do this task.

One of the approaches I figured out is just trying to insert / fill with zeroes at a specific position of the stream, I know how to do that but... how I'm supposed to translate a zero (a byte) to milliseconds to calculate the duration of the silence to insert in the MP3 file?. So I don't know if just inserting a sequence of zeroes will work as a silence, and in case of it works I don't know how to translate that sequence of zeros to time, and also I don't know whether this approach would be secure for all kind of MP3 file variants (CBR, VBR, ABR, mono or stereo channel, etc).

The second approach I think of is to use any audio editor software to generate a MP3 file that will consist of a silence of 1 millisecond, and just insert and concatenate that silence as many times as required in a specific position of the MP3 file stream. I think I would require to generate this 1 ms MP3 file for every possible CBR bitrate, but what happens for VBR and ABR?, I'm stuck with this idea.

Probably at the end things will be very easier than my thoughts, and sure NAudio could help me to accomplish this task or at least to accomplish a big part of it with less effort.

Question

How can I insert a silence of specific duration at a specific position / duration of a undetermined MP3 file format ( which could be CBR, VBR, ABR, single or stereo channel, joint stereo, 128 or 320 kbps, etc) using C# or VB.NET with or without the help of NAudio or other library for .NET?.

Requeriments

  • NOT USING THIRD PARTY COMMAND-LINE APPLICATIONS neither automating GUI apps.

  • The file modifications should be done without audio loss, that is without reencoding the file. In the same way as for example MP3DirectCut does, on which you can insert silence or cut & paste without reencoding.

  • Preferably it would be appreciated the implementation of a reusable universal function like the one below, with this prototype of parameters that I have thought to try simplify things:

     public static MemoryStream InsertSilence(
                     Stream inputFile, // pass the raw file stream data
                     TimeSpan startPosition, // eg: new TimeSpan(0, 2, 10)
                     TimeSpan silenceDuration // eg. TimeSpan.FromSeconds(10)
     ) {
    
         // Do the work, save the data into a new stream and return it.
    
     return null;
     }
    
like image 496
ElektroStudios Avatar asked Nov 06 '22 03:11

ElektroStudios


1 Answers

any manipulation of digital audio happens when the audio is in PCM format also called raw audio ... every audio codec ( mp3 etc. ) can be decoded into PCM -> do your manipulations -> then encode the PCM into any audio codec

once in PCM format identify range of your audio curve wobble to determine its zero crossing ... in PCM each audio sample ( point on the audio curve ) is typically an integer ( could be a 16 bit int, or 24 bit or 32 bit, etc. ) ... so if its an unsigned 16 bit integer its values vary from 0 to 2^16 - 1 ( 0 to 65535 ) in which case its zero crossing is the middle value of that range ... also pay attention to whether you have signed or unsigned integers ... unsigned is most popular and can only have values from zero on up whereas signed integers can store negative values ... if you have signed integers most likely your zero crossing value is zero ... in either case zero crossing is always the middle value of your integer's maximum possible range

to add silence you add a series of values to your PCM array of whatever your zero crossing value happens to be driven by knowing the sample bit depth

pay attention to notion of endianness ... a WAV file has a 44 byte header section followed by a payload in PCM format ... as you walk across the payload to parse the next audio sample if your bit depth ( as identified in the header section ) is say 16 bits then an audio sample takes two bytes and endianness will determine whether the most significant byte comes first or last in this set of bytes

easiest to use mono and I highly suggest you get your code working using only mono and not multi channel like stereo ... only add multi channel one you reach success with mono

top tip first convert your mp3 into WAV then do manip then encode back into mp3

like image 110
Scott Stensland Avatar answered Nov 09 '22 23:11

Scott Stensland