Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CMU Sphinx for Voice/Speaker Recognition

I'm looking for a way to match a known data set, let's say a list of MP3s or wav files, each which is a sample of someone speaking. At this point I know file ABC is of Person X speaking.

I would then like to take another sample, and do some voice matching to show who this voice is most likely of, given then known data set.

Also, I don't necessarily care what the person has said, as long as I can find a match, i.e I don't need any transcribing or otherwise.

I'm aware CMU Sphinx doesn't do voice recognition, and it's primarily used for voice-to-text, but I have seen other systems, eg: the LIUM Speaker Diarization (http://cmusphinx.sourceforge.net/wiki/speakerdiarization) or the VoiceID project (https://code.google.com/p/voiceid/) which uses CMU as a base for this type of work.

If I am to use CMU, how can I do voice matching?

Also, if CMU Sphinx isn't the best framework, is there an alternate that's open source?

like image 396
Dominic Avatar asked Jan 10 '13 00:01

Dominic


1 Answers

This is a subject which would be adequate in complexity for a PhD thesis. There are no good and reliable systems as of right now.

The task you're up for is a very complex one. How you should approach it depends on your situation.

  • do you have a limited amount of people? how many?
  • how much data do you have for each person?

If you have very few people to recognize, you may attempt something as simple as obtaining formants of those people and comparing them to a sample.

Otherwise - you have to contact some academics who work on the subject or jury rig a solution of your own. Either way, as I said, it is a difficult problem.

like image 72
Dariusz Avatar answered Oct 21 '22 10:10

Dariusz