CMU Sphinx for Voice/Speaker Recognition

Question

I'm looking for a way to match a known data set, let's say a list of MP3s or wav files, each which is a sample of someone speaking. At this point I know file ABC is of Person X speaking.

I would then like to take another sample, and do some voice matching to show who this voice is most likely of, given then known data set.

Also, I don't necessarily care what the person has said, as long as I can find a match, i.e I don't need any transcribing or otherwise.

I'm aware CMU Sphinx doesn't do voice recognition, and it's primarily used for voice-to-text, but I have seen other systems, eg: the LIUM Speaker Diarization (http://cmusphinx.sourceforge.net/wiki/speakerdiarization) or the VoiceID project (https://code.google.com/p/voiceid/) which uses CMU as a base for this type of work.

If I am to use CMU, how can I do voice matching?

Also, if CMU Sphinx isn't the best framework, is there an alternate that's open source?

Dariusz · Accepted Answer

This is a subject which would be adequate in complexity for a PhD thesis. There are no good and reliable systems as of right now.

The task you're up for is a very complex one. How you should approach it depends on your situation.

do you have a limited amount of people? how many?
how much data do you have for each person?

If you have very few people to recognize, you may attempt something as simple as obtaining formants of those people and comparing them to a sample.

Otherwise - you have to contact some academics who work on the subject or jury rig a solution of your own. Either way, as I said, it is a difficult problem.

CMU Sphinx for Voice/Speaker Recognition

Tags:

pattern-matching

audio

speech-recognition

voice-recognition

cmusphinx

Dominic

1 Answers

Dariusz

Recent Activity

Donate For Us

CMU Sphinx for Voice/Speaker Recognition

Tags:

pattern-matching

audio

speech-recognition

voice-recognition

cmusphinx

Dominic

1 Answers

Dariusz

Related questions

Recent Activity

Donate For Us