hidden markov model thresholding

Tags:

I have developed a proof of concept system for sound recognition using mfcc and hidden markov models. It gives promising results when I test the system on known sounds. Although the system, when an unknown sound is inputted returns the result with the closest match and the score is not that distinct to devise it is an unknown sound e.g.:

I have trained 3 hidden markov models one for speech, one for water coming out of water tap and one for knocking on the desk. Then I test them on unseen data and get following results:

input: speech
HMM\knocking:  -1213.8911146444477
HMM\speech:  -617.8735676792728
HMM\watertap:  -1504.4735097322673

So highest score speech which is correct

input: watertap
HMM\knocking:  -3715.7246152783955
HMM\speech:  -4302.67960438553
HMM\watertap:  -1965.6149147201534

So highest score watertap which is correct

input: knocking
HMM\filler  -806.7248912250212
HMM\knocking:  -756.4428782636676
HMM\speech:  -1201.686687761133
HMM\watertap:  -3025.181144273698

So highest score knocking which is correct

input: unknown
HMM\knocking:  -4369.1702184688975
HMM\speech:  -5090.37122832872
HMM\watertap:  -7717.501505674925

Here the input is an unknown sound but it still returns the closest match as there is no system for thresholding/garbage filtering.

I know that in keyword spotting an OOV (out of vocabulary) sound can be filtered out using a garbage or filler model but it says it is trained using a finite set of unknown words where this can't be applied to my system as I don't know all the sounds that the system may record.

How is a similar problem solved in speech recognition system? And how can I solve my problem to avoid false positives?

927

asked Jun 22 '12 11:06

Radek

2 Answers

To reject other words you need a filler model.

This is a statistical hypothesis test. You have two hypothesis (word is known and word is unknown). To make a decision you need to estimate a probability of each hypothesis.

Filler model is trained from the speech you have, just in a different way, for example it might be a single gaussian for any speech sound. You compare score from generic filler model and score from the word HMM and make a decision. For more in-depth information and advanced algorithms you can check any paper on keyword spotting. This thesis have a good review:

ACOUSTIC KEYWORD SPOTTING IN SPEECH WITH APPLICATIONS TO DATA MINING A. J. Kishan Thambiratnam

http://eprints.qut.edu.au/37254/1/Albert_Thambiratnam_Thesis.pdf

answered Nov 03 '22 00:11

Nikolay Shmyrev

So what I have done is: I created my simplified version of a filler model. Each hmm representing watertap sound, knocking sound and speech sound is a seperate 6 state hmm trained by sounds from training set of 30, 50, 90 sounds respectively of various lengths 0.3 sec to 10 seconds. Then I created a filler model which is a 1 state hmm consisting od all the training set sounds for knocking, watertap and speech. So if the hmm model score is greater for a given sound than the filler's score - sound is recognized otherwise it is an unknown sound. I don't really have large data but I have perfoormed a following test for false positives rejection and true positives rejection on unseen sounds.

true positives rejection
knocking 1/11 = 90% accuracy
watertap 1/9 = 89% accuracy
speech 0/14 = 100% accuracy


false positives rejection
Tested 7 unknown sounds
6/7 = 86% accuracy

So from this quick test I can conclude that this approach gives reasonable results although I have a strange feeling it may not be enough.

answered Nov 02 '22 23:11

Radek

Related questions
                            
                                How to find the MAC address of a 'shutdown' system on local lan (is it possible?)
                            
                                Eclipse "Add unimplemented methods" method ordering
                            
                                How do I ensure dependent configurations are initialized with Spring @Configuration annotation?
                            
                                How to access an object Foo contained in a scala package object from Java?
                            
                                Mapping files bigger than 2GB with Java
                            
                                Does a PhantomReference being in a ReferenceQueue stop the PhantomReference from being GC'd?
                            
                                Web Application Server Monitoring
                            
                                Cleaning up Jetty - Removing 'unnecessaries' things
                            
                                JavaMail: get size of a MimeMessage
                            
                                GetAsyncKeyState and VirtualKeys/special characters using JNA (JAVA)
                            
                                Streaming or custom Jar in Hadoop
                            
                                Complete metaprogramming framework for Java?
                            
                                Non-database application transactions
                            
                                how set to the CENTER title in AlertDialog? [duplicate]
                            
                                Testing in "Airplane" mode in Android Emulator
                            
                                scanning java classpath in the maven plugin
                            
                                Spring 3.1 PropertySourcesPlaceholderConfigurer and conditional import
                            
                                How do I determine the context in which a ColdFusion object resides?
                            
                                Android ProGuard "java.lang.nosuchfielderror: Toast" exception
                            
                                Options for free (and preferably open source) speech to text library [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

hidden markov model thresholding

Tags:

java

algorithm

artificial-intelligence

speech-recognition

hidden-markov-models

Radek

People also ask

2 Answers

Nikolay Shmyrev

Radek

Recent Activity

Donate For Us