Hello!
As suggested by roymacdonald, this is the collected info in this thread about my method used to turn Kinect depth information into useful blobs that can be turned into a multitouch interface. The original comment can be found at the end of this one. Also, any personal comment I make about this method may be biased since I’ve had little experience with other people’s methods. Part of the info has also been edited since it was reviewed for errors.
This is also the first time I share code that seems useful so I went out of my way to do it, many thanks to patriciogonzalezvivo, roymacdonald, lahiru and of course kylemcdonald!
Info about the method
This method seems (?) to be simplest and also might be the shortest while still being precise. It also doesn’t use auto calibration and expects the user to determine the projection area (which might explain how short it is). This might actually help the method and make it useful on irregular surfaces. We also handled the depth images as if it were 8 bpc when it actually is 11 bpc, this might truncate data but still works very nicely. For those who don’t know, 8 bpc would give me a 255 grey palette (2^8) and a 11 bpc would a 2048 (2^11), and here the grey value defines the depth of floating point that the kinect is interpreting, so more bpc equates to more depth resolution.
We used Microsofts Kinect for Windows SDK 1.0, openFrameworks 007 with ofxOpenCv and ofxKinectNui ( https://github.com/sadmb/ofxKinectNui ) but I’m guessing ofxOpenNI should work just the same while being multi-platform, the important thing is that to start off you need to somehow get the kinect depth image.
This method was made in order to be used on a multitouch wall so the asumption is that users extends their hands, so the position of touch is expected to be the tip of the fingers which would be on one side of the bounding box of the blob.
Pseudocode
Remembering that neither the projector nor the kinect have to be in a certain position (except of course being able to see the same surface) the process is the following:
[tt]1-Capture the depth image and put it in a ofxCvGrayscaleImage that we will call tempGreyImage.
2-Remember to copy tempGreyImage into GreyImagediff for difference when user presses space.
3-Take both the tempGreyImage and GreyImagediff into two greyscale ofPixels.
4-Fill another greyscale ofPixels with the result of the following pseudocode cicle:
for each pixel{
Calculate the difference between tempGreyImage pixels and GreyImagediff pixels.
if said difference is lower than threshold A and higher than threshold B set value to 0;
else set value to 1000 (any value over max pixel value will be ok, it will be clamped anyways);
}
5-Take the resulting ofPixels back to another ofxCvGrayscaleImage called FinalGreyDiff.
6-NOW warpIntoMe() to FinalGreyDiff and do openCV’s .threshold() and .blur().
7-Do not use centroids as source of touch data. Instead use the coordenates of the sides of their bounding boxes.[/tt]
Code
Since this code was be used with ofxKinectNui in Windows I’d better just post the relevant portion of code so you can try it out on ofxOpenNI if you like.
“destino_camera_warp[]” are 4 ofPoints that are set by mouse or take from an XML, they define how the image will be warped prior to blob analysis, for that it also needs “entrada[]” which are the 4 ofPoints corners for the image.
In the testApp.h file:
ofPoint destino_camera_warp[4];
ofPoint entrada[4];
ofImage colorImg;
ofxCvGrayscaleImage tempGrayImage;
ofPixels monoPixels;
ofPixels monoPixels_diff;
ofPixels grayPixels;
ofxCvGrayscaleImage paso_openCV;
ofxCvGrayscaleImage grayImage;
ofxCvContourFinder encuentracontornos;
In the testApp.cpp file:
void testApp::setup() {
// ...
// Relevant Code
colorImg.allocate(320, 240, OF_IMAGE_COLOR);
tempGrayImage.allocate(320,240);
monoPixels.allocate(320,240,OF_PIXELS_MONO);
monoPixels_diff.allocate(320,240,OF_PIXELS_MONO);
grayPixels.allocate(320,240,OF_PIXELS_MONO);
paso_openCV.allocate(320,240);
grayImage.allocate(320,240);
grayImage_No_Blur.allocate(320,240);
}
void testApp::update() {
// ...
// Relevant Code
kinect.update();
if(kinect.isOpened()){
colorImg.setFromPixels(kinect.getDepthPixels());
tempGrayImage.setFromPixels(kinect.getDepthPixels());
monoPixels.setFromPixels(tempGrayImage.getPixels(), 320,240, 1);
if (bLearnBakground == true){
monoPixels_diff.setFromPixels(tempGrayImage.getPixels(), 320,240, 1);
bLearnBakground = false;
}
for (int i = 0; i < 320*240; i++){
int valtemp = monoPixels[i] - monoPixels_diff[i];
if (valtemp < thresholdLow || valtemp > thresholdHigh){
valtemp = 0;
} else {
valtemp = 1000;
}
grayPixels[i] = (unsigned char)valtemp;
}
paso_openCV.setFromPixels(grayPixels);
grayImage.warpIntoMe(paso_openCV, entrada, destino_camera_warp);
grayImage.threshold(threshold);
grayImage.blur(blurcv);
encuentracontornos.findContours(grayImage, min_blob_size, max_blob_size, cantidad_blobs, encuentra_hoyos_en_blobs);
}
}
And that is basically it, what you want to do is to NOT use the centroid of a blob but rather the bounding box like this (this is for only one blob).
float valX = encuentracontornos.blobs[0].centroid.x -
float valY = encuentracontornos.blobs[0].centroid.y - encuentracontornos.blobs[0].boundingRect.height/2;
Here is an example of why we used the bounding box:
http://forum.openframeworks.cc/t/universal-multitouch-wall-using-microsoft-sdk,-openframeworks-and-ofxkinectnui/9908/39
The big modification in the original code replaces openCV’s absDiff() while at the same time doing the depth clipping.
The method posted here of course very similar to openFrameworks’s opencvExample (that is the basis of ours), but that method seemed to exagerated noise so we got more false blobs. For a better explanation of the different results between both methods go here:
http://forum.openframeworks.cc/t/universal-multitouch-wall-using-microsoft-sdk,-openframeworks-and-ofxkinectnui/9908/56
That is basically it, I believe I have not skipped any relevant information. =D
Original Comment