Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to separate an audio file based on different speakers

I have a bunch of audio files about telephone conversation. I want to try to split an audio file into two, each contains only one speaker's speech. Maybe I need to use speech diarization. But how can I do that? anybody can give me some clues? Thank you. ps: Linux OS.C/C++

like image 856
Bo Liu Avatar asked Oct 18 '12 18:10

Bo Liu


1 Answers

While separating the individual speakers is quite a difficult problem you can automatically split the audio where there are pauses. This would produce a series of files that would likely be easier to manage since speakers often alternate between pauses.

This approach requires the open source Julius speech recognition decoder package. This is available in many Linux package repositories. I use the Ubuntu multiverse repository.

Here is the site: http://julius.sourceforge.jp/en_index.php


Step 0: Install Julius

sudo apt-get install julius

Step 1: Segment Audio

adintool -in file -out file -filename myRecording.wav -startid 0 -freq 44100 -lv 2048 -zc 30 -headmargin 600 -tailmargin 600
  • -startid is the starting segment number that will be appended to the filename

  • -freq is the sample rate of the source audio file

  • -lv is the level of the audio above which voice detection will be active

  • -zc is the zero crossings above which voice detection will be active

  • -headmargin and -tailmargin is the amount of silence before and after each audio segment

Note that -lv and -zc will have to be adjusted for your particular audio recording's attributes while -headmargin and -tailmargin will have to be adjusted for your particular speaker's styles. But the values given above have worked well for my voice recordings in the past.

Here is the documentation: http://julius.sourceforge.jp/juliusbook/en/adintool.html


In my experience preprocessing the audio using compression and normalization gives better results and requires less adjustment of the Julius arguments. These initial steps are recommended but not required.

This approach requires the open source SoX audio toolkit package. This is also available in many Linux package repositories. I use the Ubuntu universe repository.

Here is the site: http://sox.sourceforge.net


Step -2: Install SoX

sudo apt-get install sox

Step -1: Preprocess Audio

sox myOriginalRecording.wav myRecording.wav gain -b -n -8 compand 0.2,0.6 4:-48,-32,-24 0 -64 0.2 gain -b -n -2
  • gain -b -n balances and normalizes the audio to a given level

  • compand compresses (in this case) the audio based on the parameters

Note that compand may require some time to completely understand the parameters. But the values given above have worked well for my voice recordings in the past.

Here is the documentation: http://sox.sourceforge.net/sox.html


While this will not give you identification of each speaker it will greatly simplify the task of doing it by ear, which may end up being the only option for a while. But I do hope you find practical solution if it is already available.

like image 65
Kelly Christoffersen Avatar answered Oct 01 '22 06:10

Kelly Christoffersen