There are some techniques for direct measurement of distance: laser interferometry, lidar, ir, ultrasonic… but these are either too slow or too expensive.
Other techniques rely on a mapping between two perspectives. If you have two cameras, you need to know which pixels in the image plane of one camera correspond to which pixels in the image plane of the other. Creating this mapping using computer vision algorithms like feature tracking won’t work here, because projection surfaces are generally featureless
The camera + projector combo is more promising, especially because you already have half of the rig installed. Like Memo said, I worked on this a bit last week, and posted some code: http://www.openprocessing.org/visuals/?visualID=1014.
Besides Johnny Chung Lee’s thesis (which is cool, but requires extra hardware + electronics), my biggest inspiration right now is Dr. Song Zhang’s work http://www.vrac.iastate.edu/~song/index.php His lab put together the technology that Radiohead used to acquire Thom Yorke in real-time 3D (640x480, 40 fps). You might have trouble finding the papers, but I have access via my school so I’ve uploaded two that I find particularly nice: http://rpi.edu/~mcdonk/of/2+1%20phase-shift.pdf, http://rpi.edu/~mcdonk/of/3%20phase-shift.pdf. Despite what Radiohead PR says, cameras were totally used to make the video.
I understand two competing requirements for this technology: speed and resolution. If you have a dslr that can take a burst of 20 frames over a minute, then use that as opposed to a webcam, and do the calibration offline. If you want something faster, lower resolution, use a webcam and build the mapping into your app (~5 second capture). You could even get away from structured light and go to laser-based a la DAVID http://www.david-laserscanner.com/ (~1 minute capture). If you want it really fast (i.e., >1 fps) you don’t want to use gray codes (like Johnny Chung Lee or I use), you want to use the phase-shift method in the papers I posted above. I’m working on a hybrid technique now that does a multi-scale phase-shift for high resolution and intermediate (1 fps ish) capture time.
One more reference I’ll point out: Multiview Geometry for Camera Networks http://www.ecse.rpi.edu/~rjradke/papers/radkemcn08.pdf goes over the math and terms involved in doing this “correctly” (e.g., lenses on cameras have radial distortion that needs to be accounted for, etc.) To make things work most basically, you don’t really need all this (if you can do basic geometry in 3D you’re set), but it’s faster and more robust this way.
I hope some of this was useful!