Determine skeleton joints with a webcam (not Kinect)

Question

I'm trying to determine skeleton joints (or at the very least to be able to track a single palm) using a regular webcam. I've looked all over the web and can't seem to find a way to do so.

Every example I've found is using Kinect. I want to use a single webcam.

There's no need for me to calculate the depth of the joints - I just need to be able to recognize their X, Y position in the frame. Which is why I'm using a webcam, not a Kinect.

So far I've looked at:

OpenCV (the "skeleton" functionality in it is a process of simplifying graphical models, but it's not a detection and/or skeletonization of a human body).
OpenNI (with NiTE) - the only way to get the joints is to use the Kinect device, so this doesn't work with a webcam.

I'm looking for a C/C++ library (but at this point would look at any other language), preferably open source (but, again, will consider any license) that can do the following:

Given an image (a frame from a webcam) calculate the X, Y positions of the visible joints
[Optional] Given a video capture stream call back into my code with events for joints' positions
Doesn't have to be super accurate, but would prefer it to be very fast (sub-0.1 sec processing time per frame)

Would really appreciate it if someone can help me out with this. I've been stuck on this for a few days now with no clear path to proceed.

UPDATE

2 years later a solution was found: http://dlib.net/imaging.html#shape_predictor

Matěj Šmíd · Accepted Answer

To track a hand using a single camera without depth information is a serious task and topic of ongoing scientific work. I can supply you a bunch of interesting and/or highly cited scientific papers on the topic:

M. de La Gorce, D. J. Fleet, and N. Paragios, “Model-Based 3D Hand Pose Estimation from Monocular Video.,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, Feb. 2011.
R. Wang and J. Popović, “Real-time hand-tracking with a color glove,” ACM Transactions on Graphics (TOG), 2009.
B. Stenger, A. Thayananthan, P. H. S. Torr, and R. Cipolla, “Model-based hand tracking using a hierarchical Bayesian filter.,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 9, pp. 1372–84, Sep. 2006.
J. M. Rehg and T. Kanade, “Model-based tracking of self-occluding articulated objects,” in Proceedings of IEEE International Conference on Computer Vision, 1995, pp. 612–617.

Hand tracking literature survey in the 2nd chapter:

T. de Campos, “3D Visual Tracking of Articulated Objects and Hands,” 2006.

Unfortunately I don't know about some freely available hand tracking library.

samkhan13 · Answer

there is a simple way for detecting hand using skin tone. perhaps this could help... you can see the results on this youtube video. caveat: the background shouldn't contain skin colored things like wood.

here is the code:

''' Detect human skin tone and draw a boundary around it.
Useful for gesture recognition and motion tracking.

Inspired by: http://stackoverflow.com/a/14756351/1463143

Date: 08 June 2013
'''

# Required moduls
import cv2
import numpy

# Constants for finding range of skin color in YCrCb
min_YCrCb = numpy.array([0,133,77],numpy.uint8)
max_YCrCb = numpy.array([255,173,127],numpy.uint8)

# Create a window to display the camera feed
cv2.namedWindow('Camera Output')

# Get pointer to video frames from primary device
videoFrame = cv2.VideoCapture(0)

# Process the video frames
keyPressed = -1 # -1 indicates no key pressed

while(keyPressed < 0): # any key pressed has a value >= 0

    # Grab video frame, decode it and return next video frame
    readSucsess, sourceImage = videoFrame.read()

    # Convert image to YCrCb
    imageYCrCb = cv2.cvtColor(sourceImage,cv2.COLOR_BGR2YCR_CB)

    # Find region with skin tone in YCrCb image
    skinRegion = cv2.inRange(imageYCrCb,min_YCrCb,max_YCrCb)

    # Do contour detection on skin region
    contours, hierarchy = cv2.findContours(skinRegion, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    # Draw the contour on the source image
    for i, c in enumerate(contours):
        area = cv2.contourArea(c)
        if area > 1000:
            cv2.drawContours(sourceImage, contours, i, (0, 255, 0), 3)

    # Display the source image
    cv2.imshow('Camera Output',sourceImage)

    # Check for user input to close program
    keyPressed = cv2.waitKey(1) # wait 1 milisecond in each iteration of while loop

# Close window and camera after exiting the while loop
cv2.destroyWindow('Camera Output')
videoFrame.release()

the cv2.findContour is quite useful, you can find the centroid of a "blob" by using cv2.moments after u find the contours. have a look at the opencv documentation on shape descriptors.

i havent yet figured out how to make the skeletons that lie in the middle of the contour but i was thinking of "eroding" the contours till it is a single line. in image processing the process is called "skeletonization" or "morphological skeleton". here is some basic info on skeletonization.

here is a link that implements skeletonization in opencv and c++

here is a link for skeletonization in opencv and python

hope that helps :)

--- EDIT ----

i would highly recommend that you go through these papers by Deva Ramanan (scroll down after visiting the linked page): http://www.ics.uci.edu/~dramanan/

C. Desai, D. Ramanan. "Detecting Actions, Poses, and Objects with Relational Phraselets" European Conference on Computer Vision (ECCV), Florence, Italy, Oct. 2012.
D. Park, D. Ramanan. "N-Best Maximal Decoders for Part Models" International Conference on Computer Vision (ICCV) Barcelona, Spain, November 2011.
D. Ramanan. "Learning to Parse Images of Articulated Objects" Neural Info. Proc. Systems (NIPS), Vancouver, Canada, Dec 2006.

Nallath · Answer

The most common approach can be seen in the following youtube video. http://www.youtube.com/watch?v=xML2S6bvMwI

This method is not quite robust, as it tends to fail if the hand is rotated to much (eg; if the camera is looking at the side of the hand or at a partially bent hand).

If you do not mind using two camera's you can look into the work Robert Wang. His current company (3GearSystems) uses this technology, augmented with a kinect, to provide tracking. His original paper uses two webcams but has much worse tracking.

Wang, Robert, Sylvain Paris, and Jovan Popović. "6d hands: markerless hand-tracking for computer aided design." Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 2011.

Another option (again if using "more" than a single webcam is possible), is to use a IR emitter. Your hand reflects IR light quite well whereas the background does not. By adding a filter to the webcam that filters normal light (and removing the standard filter that does the opposite) you can create a quite effective hand tracking. The advantage of this method is that the segmentation of the hand from the background is much simpler. Depending on the distance and the quality of the camera, you would need more IR leds, in order to reflect sufficient light back into the webcam. The leap motion uses this technology to track the fingers & palms (it uses 2 IR cameras and 3 IR leds to also get depth information).

All that being said; I think the Kinect is your best option in this. Yes, you don't need the depth, but the depth information does make it a lot easier to detect the hand (using the depth information for the segmentation).

Determine skeleton joints with a webcam (not Kinect)

Tags:

opencv

gesture-recognition

webcam

tracking

human-interface

YePhIcK

3 Answers

Matěj Šmíd

samkhan13

Nallath

Recent Activity

Donate For Us

Determine skeleton joints with a webcam (not Kinect)

Tags:

opencv

gesture-recognition

webcam

tracking

human-interface

YePhIcK

3 Answers

Matěj Šmíd

samkhan13

Nallath

Related questions

Recent Activity

Donate For Us