HMM algorithm for gesture recognition

Tags:

I want to develop an app for gesture recognition using Kinect and hidden Markov models. I watched a tutorial here: HMM lecture

But I don't know how to start. What is the state set and how to normalize the data to be able to realize HMM learning? I know (more or less) how it should be done for signals and for simple "left-to-right" cases, but 3D space makes me a little confused. Could anyone describe how it should be begun?

Could anyone describe the steps, how to do this? Especially I need to know how to do the model and what should be the steps of HMM algorithm.

505

asked Jan 28 '13 22:01

Nickon

2 Answers

One set of methods for applying HMMs to gesture recognition would be to apply a similar architecture as commonly used for speech recognition.

The HMM would not be over space but over time, and each video frame (or set of extracted features from the frame) would be an emission from an HMM state.

Unfortunately, HMM-based speech recognition is a rather large area. Many books and theses have been written describing different architectures. I recommend starting with Jelinek's "Statistical Methods for Speech Recognition" (http://books.google.ca/books?id=1C9dzcJTWowC&pg=PR5#v=onepage&q&f=false) then following the references from there. Another resource is the CMU sphinx webpage (http://cmusphinx.sourceforge.net).

Another thing to keep in mind is that HMM-based systems are probably less accurate than discriminative approaches like conditional random fields or max-margin recognizers (e.g. SVM-struct).

For an HMM-based recognizer the overall training process is usually something like the following:

1) Perform some sort of signal processing on the raw data

For speech this would involve converting raw audio into mel-cepstrum format, while for gestures, this might involve extracting image features (SIFT, GIST, etc.)

2) Apply vector quantization (VQ) (other dimensionality reduction techniques can also be used) to the processed data

Each cluster centroid is usually associated with a basic unit of the task. In speech recognition, for instance, each centroid could be associated with a phoneme. For a gesture recognition task, each VQ centroid could be associated with a pose or hand configuration.

3) Manually construct HMMs whose state transitions capture the sequence of different poses within a gesture.

Emission distributions of these HMM states will be centered on the VQ vector from step 2.
In speech recognition these HMMs are built from phoneme dictionaries that give the sequence of phonemes for each word.

4) Construct an single HMM that contains transitions between each individual gesture HMM (or in the case of speech recognition, each phoneme HMM). Then, train the composite HMM with videos of gestures.

It is also possible at this point to train each gesture HMM individually before the joint training step. This additional training step may result in better recognizers.

For the recognition process, apply the signal processing step, find the nearest VQ entry for each frame, then find a high scoring path through the HMM (either the Viterbi path, or one of a set of paths from an A* search) given the quantized vectors. This path gives the predicted gestures in the video.

177

answered Oct 04 '22 04:10

user1149913

I implemented the 2d version of this for the Coursera PGM class, which has kinect gestures as the final unit.

https://www.coursera.org/course/pgm

Basically, the idea is that you can't use HMM to actually decide poses very well. In our unit, I used some variation of K-means to segment the poses into probabilistic categories. The HMM was used to actually decide what sequences of poses were actually viable as gestures. But any clustering algorithm run on a set of poses is a good candidate- even if you don't know what kind of pose they are or something similar.

From there you can create a model which trains on the aggregate probabilities of each possible pose for each point of kinect data.

I know this is a bit of a sparse interview. That class gives an excellent overview of the state of the art but the problem in general is a bit too difficult to be condensed into an easy answer. (I'd recommend taking it in april if you're interested in this field)

answered Oct 04 '22 03:10

argentage

Related questions
                            
                                What is the best auto-suggest search algorithm for javascript
                            
                                Are there any interesting algorithms using both a stack and queue (deque) ADT?
                            
                                Extracting tokens from a string with regular expressions in .NET
                            
                                Best fit for the intersection of multiple lines
                            
                                Longest substring that occurs at least twice: C++ question
                            
                                Reorder a string by half the character
                            
                                Agent Smith self-replication from MATRIX-II [closed]
                            
                                Prove the efficiency of repeated calls to successor() in binary trees?
                            
                                Define if a contour is closed or not
                            
                                Maximum Independent Set Algorithm
                            
                                Is there a known algorithm to identify lyrics and music with matching meters? [closed]
                            
                                How can I construct a tree using d3 and its force layout?
                            
                                An efficient algorithm for finding smallest pangrammatic windows?
                            
                                How make this piece of Haskell code more concise?
                            
                                Time complexity for Shell sort?
                            
                                Algorithm to find non-dominated pairs
                            
                                Optimizing several million char* to string conversions
                            
                                Chaitin-Briggs Algorithm explanation
                            
                                Algorithm Efficiency
                            
                                An algorithm to find the nth largest number in two arrays of size n

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

HMM algorithm for gesture recognition

Tags:

algorithm

gesture-recognition

kinect

gestures

hidden-markov-models

Nickon

People also ask

2 Answers

user1149913

argentage

Recent Activity

Donate For Us