There are several approaches using ofxCv and/or ofxOpenCv). Most of the approaches require first extracting the hand contour (aka blob) from the background. Then, depending on the quality, the contour can be analyzed to calculate its centroid (aka its geometric center), which can stand in for a mouse position.
More sophisticated contour analysis techniques can be used to determine concavities (these may represent the spaces between fingers) and convexities (these might represent the fingertips) (e.g. this video or just about any of these). Further analysis can be done verify hand orientation by fitting the contours to a hand model. But that is a bit more sophisticated and at this point is probably better done with 3d sensors devices (e.g. this work presented at CHI earlier this year) rather than RGB cameras.
Anyway, to track a hand with a normal RGB camera, the first step is usually to isolate the hand from the background. This can be done by a standard background subtraction and thresholding operation (see
ofxCv/example-background for example) to produce a binary image. The resulting binary image is then passed to a contour finder for connected-component analysis resulting in a set of contours (aka blobs).
While there are color-based based methods of isolating the hand from the background (see examples of skin-color based segmentation here), these can be less effective in real-world installation settings when presented with a wider range of skin tones.
In the end, the most reliable way of tracking hands (or anything for that matter) is to reduce the “noise” (i.e. the background) in your “signal” (i.e. the hand) as much as possible. One of the easiest ways to do this is to pick a camera orientation that results in a high contrast homogenous stable background (think of a camera pointed down onto a monochromatic surface so that the hand is in sharp contrast to the background). Another way is the “shadow puppet” approach. A shadow is very high contrast and sometimes a simple threshold makes background subtraction unnecessary. In order to avoid shadows cast by light in the visible spectrum, shadows are often cast using near infrared spectrum light. Shadows can be cast by IR leds (or flood lights with dark dark red or Wratten gel filters) and “sensed” by cameras cameras with no IR filters. Many cheap webcams such as the PS3 Eye can be modified to remove the IR filters. IR illuminators can be made or found on ebay or in stores that sell closed circuit surveillance cameras. I found the diagrams (e.g. 1 2 3) and technical rider of @zach and @golan’s Mesa di Voce to be very instructive early on. Anyway, all of these simple (and not-so-simple) tricks have traditionally been used to make it easier to remove the background.
But all of that became much much easier with the kinect and subsequent RGBD cameras. They were cheap and all but solved the background subtraction problem (for most smaller scale applications at least) by providing a clean “depth” image that could be separated into background and foreground based on its physical location (i.e. typically, things that are farther away from the depth camera are in the background and things that are closer are in the foreground). Thus, a simple thresholding operation could say that only things that are within 1 meter of the sensor are in the foreground.
That said, depth can be determined (with with pretty decent results) using stereo RGB cameras (see this for example). Previously high res/high speed synchronized stereo camera rigs were super expensive and computationally intensive, so they weren’t used all that much in art contexts. But hacked PS3Eye cameras can be synchronized to give pretty nice results (much of this RGB camera hacking was pioneered by various members of the oF community who spent a lot of time figuring that stuff out 5 or 6 years ago). Needless to say, the PS3Eye is still alive and well in the oF community and the cameras are now super inexpensive. Now that they are so cheap some of us have recently purchased hundreds of them for various experiments
Anyway … hopefully that isn’t too much information. It’s been a lot of fun to watch how these things have changed since oF got started.