audio comparison with R

Question

I am working in a project where my task deals with speech/audio/voice comparison. This project is used for judging the winner in the competitions(mimicry). Practically I need to capture the user's speech/voice and compare it with the original audio file and return a percentage match. I need to develop this in R-language.

I had already tried voice related packages in R (tuneR, audio, seewave) but in my search, I am not able to get the comparison related information.

I need some assistance from you guys that where, I can find the information related to my work, which is the best way to handle this type of problems and if there, what are the prerequisites for processing these type of audio related work.

Renan Vilas Novas · Accepted Answer

Basically, the best features to be used for speech/voice comparison are the MFCC.

There are some softwares that can be used to extract these coefficients: Praat website
You can also try to find a lib to extract these coefficients.
[Edit: I've found in tuneR documentation that it has a function to extract MFCC - search for the function melfcc()]

After you've extracted these features, you can use Machine Learning (SVM, RandomForests or something like that) to develop a classifier.

I have a seminar that I've presented about Speaker Recognition Systems, take a look at it, it may be helpful. (Seminar)

If you have time and interest, you could algo read:
Authors: Kinnunen, T., & Li, H. (2010)
Paper: an overview of text-independent speaker recognition: From features to supervectors

After you get a feature vector for each audio sample (with MFCC and/or other features), then you'll need to compare pairs of feature vectors (Features from A versus Features from B):
You could try to use the Absolute Difference between these feature vectors:

abs(feature vector from A - feature vector from B)

The result of the operation above is a feature vector where every element is >=0 and it has the same size of the A (or B) feature vector.

You could also test the element-wise multiplication between A and B features:

(A1*B1, A2*B2, ... , An*Bn)

Then you need to label each feature vector
(1 if person A == person B and 0 if person A != person B).

Usually the absolute difference performs better than the multiplication feature vector, but you can append both vectors and test the performance of the classifier using both the abs diff and the multiplication features at the same time.

audio comparison with R

Tags:

r

audio-processing

Dinesh

1 Answers

Renan Vilas Novas

Recent Activity

Donate For Us

audio comparison with R

Tags:

r

audio-processing

Dinesh

1 Answers

Renan Vilas Novas

Related questions

Recent Activity

Donate For Us