how to separate an audio file based on different speakers

Question

I have a bunch of audio files about telephone conversation. I want to try to split an audio file into two, each contains only one speaker's speech. Maybe I need to use speech diarization. But how can I do that? anybody can give me some clues? Thank you. ps: Linux OS.C/C++

Kelly Christoffersen · Accepted Answer

While separating the individual speakers is quite a difficult problem you can automatically split the audio where there are pauses. This would produce a series of files that would likely be easier to manage since speakers often alternate between pauses.

This approach requires the open source Julius speech recognition decoder package. This is available in many Linux package repositories. I use the Ubuntu multiverse repository.

Here is the site: http://julius.sourceforge.jp/en_index.php

Step 0: Install Julius

sudo apt-get install julius

Step 1: Segment Audio

adintool -in file -out file -filename myRecording.wav -startid 0 -freq 44100 -lv 2048 -zc 30 -headmargin 600 -tailmargin 600

-startid is the starting segment number that will be appended to the filename
-freq is the sample rate of the source audio file
-lv is the level of the audio above which voice detection will be active
-zc is the zero crossings above which voice detection will be active
-headmargin and -tailmargin is the amount of silence before and after each audio segment

Note that -lv and -zc will have to be adjusted for your particular audio recording's attributes while -headmargin and -tailmargin will have to be adjusted for your particular speaker's styles. But the values given above have worked well for my voice recordings in the past.

Here is the documentation: http://julius.sourceforge.jp/juliusbook/en/adintool.html

In my experience preprocessing the audio using compression and normalization gives better results and requires less adjustment of the Julius arguments. These initial steps are recommended but not required.

This approach requires the open source SoX audio toolkit package. This is also available in many Linux package repositories. I use the Ubuntu universe repository.

Here is the site: http://sox.sourceforge.net

Step -2: Install SoX

sudo apt-get install sox

Step -1: Preprocess Audio

sox myOriginalRecording.wav myRecording.wav gain -b -n -8 compand 0.2,0.6 4:-48,-32,-24 0 -64 0.2 gain -b -n -2

gain -b -n balances and normalizes the audio to a given level
compand compresses (in this case) the audio based on the parameters

Note that compand may require some time to completely understand the parameters. But the values given above have worked well for my voice recordings in the past.

Here is the documentation: http://sox.sourceforge.net/sox.html

While this will not give you identification of each speaker it will greatly simplify the task of doing it by ear, which may end up being the only option for a while. But I do hope you find practical solution if it is already available.

how to separate an audio file based on different speakers

Tags:

c++

c

linux

audio

speech

Bo Liu

1 Answers

Kelly Christoffersen

Recent Activity

Donate For Us

how to separate an audio file based on different speakers

Tags:

c++

c

linux

audio

speech

Bo Liu

1 Answers

Kelly Christoffersen

Related questions

Recent Activity

Donate For Us