Universal Multitouch Wall using Microsoft SDK, openFrameworks and ofxKinectNui

We could have used floating point but when we tried it out we were able to get it to read the tips of our fingers (at least very close to them) so we just left it with how it worked.

We get really good precision with almost no calibration so I’m thinking that maybe the problem is not the code itself but maybe the surface you intend to use (as I know you tried lcds). Now that I remember, you should be able to use irregular surfaces with your method as it seems to be the same as ours. I’ll get the code and post it later (not on this computer).

I also think that if the method is similar the only real difference could be the modifying of the values after calculating the difference. Since ofxOpenCv has its own ways of dealing with threshold we gave it what it needed to get the job done easily so this is the reason we process per pixel.
To me this seems to be more a signal processing problem than it is a programming one so we wrote it on paper and then in code really quickly.

Edit-> I also remember trying out the brightness/contrast function in openFrameworks which is supposed to work like the one in Photoshop. We never got it to work so we just processed every pixel ourselves to give it full contrast for every difference.

@irregular,
just wondering what you mean by “as to obtain a very precise “monocromatic” image,” :slight_smile:

The simpleKinectTouch method I used was pretty much the same as you describe. I uploaded the source to mi github. https://github.com/roymacdonald/ofxSKT
it’s quite messy and using a a lot of image instances so to debug. It needs a real cleanup.
yet you can go through the the image processing methods quite easily.

I have to go now. @irregular, hope we can meet soon.
cheers

@roymacdonald
My code is very clean. The thing about the precise monocromatic is just that, we weren’t very happy with the way ofxOpenCV handled threshold so we processed the image before hand, saturating the whites from a certain threshold trying to avoid unwanted quantization.

We tried the simpleKinectTouch and it never worked as ours did, otherwise we would have never tried making another one from scratch. :wink:
My thought about this is that the method is very common now yet we get a lot of precision with very little effort, and I also think that in our case the correct sequence of code is what makes it precise.
Here comes the code (no images since I have no kinect here right now). Remember it was made with ofxKinectNui so it depends on the Microsoft SDK, which depends on Windows.

Remembering that neither the projector nor the kinect have to be in a certain position (except of course being able to see the same surface) the process is the following:

1-Capture the depth image and put it in a ofImage.
2-Take that ofImage into an ofxCvGrayscaleImage and another for difference.
3-Take the ofxCvGrayscaleImage into a monocromatic ofPixels.
4-Fill another monocromatic ofPixels with the result of the following pseudocode cicle:
for each pixel{
Calculate the difference between the difference image and the monocromatic image.
if said difference is lower than threshold A and higher than threshold B set value to 0;
else set value to 1000 (255 should be ok, I program funny :stuck_out_tongue: );
}
5-Take the ofPixels back to a ofxCvGrayscaleImage.
6-NOW warpIntoMe and do the openCV.
7-Do not use centroids as source of touch data. Instead use the coordenates of the sides of the bounding box.

The method is of course very much like the openCV example because that is where it came from, but the important thing here is the sequence. We tried a few times with other sequences. For example, warpIntoMe before frame difference will lead to (bilinear?) resizing of the image which leads to tipical gradients in the contours of the blobs and that didn’t work very well. It also exagerated noise so we got more false blobs.

And of course I may be wrong about all of this but it seems to work very well for us. Also, you can see that no external addons are requiered, the code is really short and has no programming tricks (mostly because I know non! xD I’m not really a programmer).

Since it is to be used with ofxKinectNui in Windows I’d better just post the relevant portion of code. I also don’t have a kinect here.
destino_camera_warp are 4 ofPoints that are set by mouse, they could be set with automatic calibration but that is beyond my knowledge right now. Anyway, it only defines how the image will be warped prior to blob analysis, for that it also needs entrada which are the 4 ofPoints corners for the image.

In the testApp.h file:

  
  
ofPoint                destino_camera_warp[4];  
ofPoint                entrada[4];  
ofImage                colorImg;  
ofxCvGrayscaleImage    tempGrayImage;  
ofPixels               monoPixels;  
ofPixels               monoPixels_diff;  
ofPixels               grayPixels;  
ofxCvGrayscaleImage    paso_openCV;  
ofxCvGrayscaleImage    grayImage;  
ofxCvContourFinder     encuentracontornos;  
  

In the testApp.cpp file:

  
void testApp::setup() {  
  // ...  
  // Relevant Code  
  colorImg.allocate(320, 240, OF_IMAGE_COLOR);  
  tempGrayImage.allocate(320,240);  
  monoPixels.allocate(320,240,OF_PIXELS_MONO);  
  monoPixels_diff.allocate(320,240,OF_PIXELS_MONO);  
  grayPixels.allocate(320,240,OF_PIXELS_MONO);  
  paso_openCV.allocate(320,240);  
  grayImage.allocate(320,240);  
  grayImage_No_Blur.allocate(320,240);  
}  

  
void testApp::update() {  
  // ...  
  // Relevant Code  
kinect.update();  
if(kinect.isOpened()){  
    colorImg.setFromPixels(kinect.getDepthPixels());  
    tempGrayImage.setFromPixels(kinect.getDepthPixels());  
    monoPixels.setFromPixels(tempGrayImage.getPixels(), 320,240, 1);  
    if (bLearnBakground == true){  
        monoPixels_diff.setFromPixels(tempGrayImage.getPixels(), 320,240, 1);  
        bLearnBakground = false;  
    }  
    for (int i = 0; i < 320*240; i++){  
        int valtemp = monoPixels[i] - monoPixels_diff[i];  
        if (valtemp < thresholdLow || valtemp > thresholdHigh){  
            valtemp = 0;  
        } else {  
            valtemp = 1000;  
        }  
        grayPixels[i] = (unsigned char)valtemp;  
    }  
    paso_openCV.setFromPixels(grayPixels);  
    grayImage.warpIntoMe(paso_openCV, entrada, destino_camera_warp);  
    grayImage.threshold(threshold);  
    grayImage.blur(blurcv);  
    encuentracontornos.findContours(grayImage, min_blob_size, max_blob_size, cantidad_blobs, encuentra_hoyos_en_blobs);  
	}  
}  

And that is basically it, what you want to do is to NOT use the centroid of a blob but rather the bounding box like this (this is for only one blob)

  
  
float valX = encuentracontornos.blobs[0].centroid.x -   
float valY = encuentracontornos.blobs[0].centroid.y - encuentracontornos.blobs[0].boundingRect.height/2;  
  

I’ll make a quick photoshop thing for what you should get. Next week I’ll get the kinect back for some realtime captures.

So here it is: Info about it, its a photoshop representation of what I can remember:
2-The smal white dots that appear are noise, they are easily discarded as nothing using the ofxOpenCV parameter about minimum blob sizes. (I was talking wrong about jitter before sorry! No sleep lately).
3-Blur values can be changed on the fly of course.
4-This example expects the user to extend his hand, like to a wall. We assume that a person who extends his hand to will touch with the tip of whichever finger is the longest, what matters there is the use of the centroid X and the Y minus the Bounding Box size divided by 2.

I can think of many ways in which to improve the algorithm. We rarely use the blur command and instead have a at most 5 blobs per hand, this can be fixed by somehow telling the program to discarde some of them but it really doesn’t matter that much. For a simplified wall, a resolution of a hand is enough to be very useful.

![](http://forum.openframeworks.cc/uploads/default/2459/Foto 1.png)

This WAS designed for a multitouch wall so the assumptions and code came from that purpose. If you wish to use it on a table, I’d assume people sit around it, therefore the bounding box values should somehow be set radially inward.
Of course not necessarily, they are just ideas I’ve had in my head.

Hey I have a different question. I just got a Macbook Pro and I’m just starting to use it. What platform do you guys use in order to run simpleKinectTouch and ofxInteractiveSurface.
As I mentiond the one I made is in Windows with the Microsoft Kinect SDK.

@irregular,

Many Thanks for sharing the code and the technique! :slight_smile:

As you have mentioned, I think the most important part could be the sequence.

What do you mean by a platform to test on your Macbook Pro? We usually use ofxKinect [-https://github.com/ofTheo/ofxKinect ] with the Xcode bundle of OF.

If you want skeleton tracking and such you can use ofxOpenNI https://github.com/gameoverhack/ofxOpenNI

Of course, for the interactive surface ofxKinect would be the best though.

Again, awesome work!

@lahiru
Thanks!
Well I have to mention this, the reason behind the modified difference of the ofxOpenCv example was that the absDiff and threshold functions tended to cutoff fingers very easily. I’ll make a comparison later. I may be wrong about the whole thing anyways but I might was well try. :smiley:

Let this be a guide for learning how to do this stuff with the kinect. xD

@irregular

I think you are correct about the OpenCv functions!

In my code, I use absDiff()… and, I warp the image before finding the touch points. This leads to serios cutoff/loss of finger data/pixels on the image. For me, depth clipping of the Kinect highly affects on the final result.

Just curious, are you using depth clipping of the Kinect along with this method?

I really appreciate that you brought up this topic. Learnt so much. :slight_smile:

Actually, this part of the code is the depth clipping combined with the turning into full white. The kinect and the addons’s only function is to get the depth image, the rest is the code I posted.

  
for (int i = 0; i < 320*240; i++){    
        int valtemp = monoPixels[i] - monoPixels_diff[i];    
        if (valtemp < thresholdLow || valtemp > thresholdHigh){  //Cuts off any pixel outside a certain range.  
            valtemp = 0;    
        } else {    
            valtemp = 1000;  //or 255  
        }    
        grayPixels[i] = (unsigned char)valtemp;    
    }    

It may be similar to the absdiff but I think it does it the other way around. That is why I think it works better but I’ll have to try it with the kinect. :stuck_out_tongue:

@irregular,
great to see your code. is really neat!

for using simpleKinectTouch in OSX I ported the windows version which used openCV and QT (the gui lib not quicktime), for it I had to compile OpenCV manually. Then I decided to wrap it into an addon. (So to get rid of QT).

I’ll take a closer look to what openCV absDiff is actually doing. It seem weird that it behaves different to your algorithm.

Just a few thoughts on your code. You are using an ofImage to store the depth image instead of an ofShortImage. The ofImage is 8bit per channel and the ofShortImage is 16 bpc. The depth image from the kinect is 11 bpc. if you use the ofshortImage and the corresponding method from ofKinectNUI that reeturns the RAW depth image you’ll get a much better depth resolution.

Another nice thing about simpleKinectTouch method is that the background is an average image from 100 frames, which gives a much smoother image with much less black pixels.
What are you doing then with the blobs? some sort of tracking/labeling? OSC? TUIO? ofEvents?

As for autocalibration checkout patriciogonzalezvivo’s method. Is quite simple and works very well.

If you’re going to use ofxOpenNI download the experimental version from gameover’s github. Is way much better that one in his master branch.

Best!

BTW, irregular, Juntemonos algun dia. Como andas de tiempo?? slds!

@Roy
Claro! Juntemosnos. Disculpa la demora, era fin de semana de componer music! Te mando PM con mi correo.

I may be wrong about the absDiff, but in what we tested our method worked nicely. We supposed that one extends the arm and (from what I understood) absDiff’s threshold value would cutoff fingers first and slowly go up the arm. We wanted to make it so that absdiff would be capable of reading the very low difference values the depth image would give when the fingers are close to the wall. This is (from what I’ve learn’t from all of your comments) the very standard way of trying to read “touch” with the kinnect. We first tried ofxOpenCv’s brightnessContrast(float brightness, float contrast) with the purpose of turning completely white any shade of grey above a lower threshold but for some reason it never worked, so we did it by hand. The random white pixels that will pop up where just filtered out by the int minArea of ofxOpenCv’s findContours with very little effort. Also warpIntoMe would handle any homography problems in just one command.

So if this thread is growing bigger with info then lets fill it some more info for anyone who would want to read this.
Well it is ofxCvGrayscaleImage not ofImage but I get your point. 8 bpc would give me a 255 grey palette (2^8) and a 11 bpc would a 2048 (2^11). We didn’t change this because it worked very well.
We use a very simple OSC message with blob centroids, this is the worst part of our code because we just started learning OSC but it works well enough for our purposes. We ar just beginning to try ofEvents.

I’ve just started using OS X so its all very new to me, XCode and everything.
I will checkout Patricio’s code today! Dammit! Days go by!

Did anybody else try this? It is really good. =D
http://www.ludique.cl/labs/ludiques-kinect-bundle/

We wanted to make it so that absdiff would be capable of reading the very low difference values the depth image would give when the fingers are close to the wall.

This is exactly where the problem is! The gradient created by the kinect has a very low difference between the surface and fingers touching the wall… I think the difference is less than 1, in a 0-255 pixel image. @irregular’s method (specially the sequence) would reduce most of the issues and create a decent gradient.

Maybe Roy can add more to this. :slight_smile:

@lahiru
Maybe it was already there and I had gotten ahead of myself. =D
I’ll check later :stuck_out_tongue:

@irregular

Yeah, you have definitely passed that issue! :slight_smile: your method seems to create a decent gradient. I am gonna get it confirmed soon anyway.

Btw, just thought I should ask this as you were talking about the homography. Is there a difference between finding the homography matrix (to map the projection/image coordinates to screen coordinates) than simply getting 4 corners of the projection on the captured image, wrap the image and use ofMap to map coordinates?

screenX = ofMap(touchPoint.x, 0, image.width, 0, ofGetWidth()) ?

Thanks!

@lahiru
In our case, we are very familiar with warpIntoMe by now so we used it. It also allowed us to make a calibrator that can explain stuff nicely on screen and in just a few lines of code, but I’m guessing you could do ofMap.

Just in case anyone wants to know, warpIntoMe is an ofxOpenCv function already integrated into openFrameworks. You give it the 4 original corners and the new ones, both as an array of ofPoints, and it does the homography and processing for you. returning a corrected image very quickly.

But again, even after using warpIntoMe you have to map the image coordinates to correspondant screen coordinates, haven’t you?

(Lets say you use 320x240 image and 1024x768 screen/window. You will get X coordinates from 0 to 320 an have to map them from 0 to 1024, right?)

Oh yeah, that we do =D
We simplified it with this, since the source of the data is within an image’s width and height we just multiply the value by something, like say, source is 320x240 and you want 800x600 for a standard consumer projector, thats a 2.5 multiplier (800/320 and 600/240) so blobs are are Coordinate.x*2.5 and Coordinate.y*2.5 in their original position.
Rudimentary ofMap, nothing fancy.