Universal Multitouch Wall using Microsoft SDK, openFrameworks and ofxKinectNui


Many Thanks for sharing the code and the technique! :slight_smile:

As you have mentioned, I think the most important part could be the sequence.

What do you mean by a platform to test on your Macbook Pro? We usually use ofxKinect [-https://github.com/ofTheo/ofxKinect ] with the Xcode bundle of OF.

If you want skeleton tracking and such you can use ofxOpenNI https://github.com/gameoverhack/ofxOpenNI

Of course, for the interactive surface ofxKinect would be the best though.

Again, awesome work!

Well I have to mention this, the reason behind the modified difference of the ofxOpenCv example was that the absDiff and threshold functions tended to cutoff fingers very easily. I’ll make a comparison later. I may be wrong about the whole thing anyways but I might was well try. :smiley:

Let this be a guide for learning how to do this stuff with the kinect. xD


I think you are correct about the OpenCv functions!

In my code, I use absDiff()… and, I warp the image before finding the touch points. This leads to serios cutoff/loss of finger data/pixels on the image. For me, depth clipping of the Kinect highly affects on the final result.

Just curious, are you using depth clipping of the Kinect along with this method?

I really appreciate that you brought up this topic. Learnt so much. :slight_smile:

Actually, this part of the code is the depth clipping combined with the turning into full white. The kinect and the addons’s only function is to get the depth image, the rest is the code I posted.

for (int i = 0; i < 320*240; i++){    
        int valtemp = monoPixels[i] - monoPixels_diff[i];    
        if (valtemp < thresholdLow || valtemp > thresholdHigh){  //Cuts off any pixel outside a certain range.  
            valtemp = 0;    
        } else {    
            valtemp = 1000;  //or 255  
        grayPixels[i] = (unsigned char)valtemp;    

It may be similar to the absdiff but I think it does it the other way around. That is why I think it works better but I’ll have to try it with the kinect. :stuck_out_tongue:

great to see your code. is really neat!

for using simpleKinectTouch in OSX I ported the windows version which used openCV and QT (the gui lib not quicktime), for it I had to compile OpenCV manually. Then I decided to wrap it into an addon. (So to get rid of QT).

I’ll take a closer look to what openCV absDiff is actually doing. It seem weird that it behaves different to your algorithm.

Just a few thoughts on your code. You are using an ofImage to store the depth image instead of an ofShortImage. The ofImage is 8bit per channel and the ofShortImage is 16 bpc. The depth image from the kinect is 11 bpc. if you use the ofshortImage and the corresponding method from ofKinectNUI that reeturns the RAW depth image you’ll get a much better depth resolution.

Another nice thing about simpleKinectTouch method is that the background is an average image from 100 frames, which gives a much smoother image with much less black pixels.
What are you doing then with the blobs? some sort of tracking/labeling? OSC? TUIO? ofEvents?

As for autocalibration checkout patriciogonzalezvivo’s method. Is quite simple and works very well.

If you’re going to use ofxOpenNI download the experimental version from gameover’s github. Is way much better that one in his master branch.


BTW, irregular, Juntemonos algun dia. Como andas de tiempo?? slds!

Claro! Juntemosnos. Disculpa la demora, era fin de semana de componer music! Te mando PM con mi correo.

I may be wrong about the absDiff, but in what we tested our method worked nicely. We supposed that one extends the arm and (from what I understood) absDiff’s threshold value would cutoff fingers first and slowly go up the arm. We wanted to make it so that absdiff would be capable of reading the very low difference values the depth image would give when the fingers are close to the wall. This is (from what I’ve learn’t from all of your comments) the very standard way of trying to read “touch” with the kinnect. We first tried ofxOpenCv’s brightnessContrast(float brightness, float contrast) with the purpose of turning completely white any shade of grey above a lower threshold but for some reason it never worked, so we did it by hand. The random white pixels that will pop up where just filtered out by the int minArea of ofxOpenCv’s findContours with very little effort. Also warpIntoMe would handle any homography problems in just one command.

So if this thread is growing bigger with info then lets fill it some more info for anyone who would want to read this.
Well it is ofxCvGrayscaleImage not ofImage but I get your point. 8 bpc would give me a 255 grey palette (2^8) and a 11 bpc would a 2048 (2^11). We didn’t change this because it worked very well.
We use a very simple OSC message with blob centroids, this is the worst part of our code because we just started learning OSC but it works well enough for our purposes. We ar just beginning to try ofEvents.

I’ve just started using OS X so its all very new to me, XCode and everything.
I will checkout Patricio’s code today! Dammit! Days go by!

Did anybody else try this? It is really good. =D

We wanted to make it so that absdiff would be capable of reading the very low difference values the depth image would give when the fingers are close to the wall.

This is exactly where the problem is! The gradient created by the kinect has a very low difference between the surface and fingers touching the wall… I think the difference is less than 1, in a 0-255 pixel image. @irregular’s method (specially the sequence) would reduce most of the issues and create a decent gradient.

Maybe Roy can add more to this. :slight_smile:

Maybe it was already there and I had gotten ahead of myself. =D
I’ll check later :stuck_out_tongue:


Yeah, you have definitely passed that issue! :slight_smile: your method seems to create a decent gradient. I am gonna get it confirmed soon anyway.

Btw, just thought I should ask this as you were talking about the homography. Is there a difference between finding the homography matrix (to map the projection/image coordinates to screen coordinates) than simply getting 4 corners of the projection on the captured image, wrap the image and use ofMap to map coordinates?

screenX = ofMap(touchPoint.x, 0, image.width, 0, ofGetWidth()) ?


In our case, we are very familiar with warpIntoMe by now so we used it. It also allowed us to make a calibrator that can explain stuff nicely on screen and in just a few lines of code, but I’m guessing you could do ofMap.

Just in case anyone wants to know, warpIntoMe is an ofxOpenCv function already integrated into openFrameworks. You give it the 4 original corners and the new ones, both as an array of ofPoints, and it does the homography and processing for you. returning a corrected image very quickly.

But again, even after using warpIntoMe you have to map the image coordinates to correspondant screen coordinates, haven’t you?

(Lets say you use 320x240 image and 1024x768 screen/window. You will get X coordinates from 0 to 320 an have to map them from 0 to 1024, right?)

Oh yeah, that we do =D
We simplified it with this, since the source of the data is within an image’s width and height we just multiply the value by something, like say, source is 320x240 and you want 800x600 for a standard consumer projector, thats a 2.5 multiplier (800/320 and 600/240) so blobs are are Coordinate.x*2.5 and Coordinate.y*2.5 in their original position.
Rudimentary ofMap, nothing fancy.

HI, I haven’t had time to analyze what’s going on with the absDiff and irregular’s method.
Anyways, if it works it’s fine then.

getting the homography matrix might not be the same as using the wrapmeinto method.
The main difference would be the amount of maths that you’ll have to code. I went first for the homography matrix method and I wasn’t nice. Using wrapMeInto is much more straight forward and easier.

Maybe editing the first post in this thread with all the collected info might help others so they don’t have to read the whole thread. (there are some forums that I usually read in which moderators do such job and is very useful an pleasing to read).


Good idea!
Give me a few days while I sort everything here. I have a kinect now to make better samples. =)

Hello everyone!
Before I remake my initial post with all the info as roymacdonald suggested I will post my findings.
As has been pointed out the standard difference method (Method 1) is very similar to what we had programmed (Method 2). Since Method 2 had been written a few months ago I had forgotten why we had started from scratch in order to filter the image and I finally remembered why we did, so here is the explanation: Method 1 gave us noise and in order to lower the noise we had to proceed to “cut fingers”. Worst happened when proceded to do homography (or in our case warpIntoMe) and to crop the image and resize. Since the initial image had more noise it was “amplified” and started to appear as blobs on the countourfinder, this of course might have been cleaned up in the minArea of ofxOpenCv’s findContours, then again it may have not. So we started from scratch and for some reason it worked very nicely.
We know there are other methods to cleanup images but this was far simpler for us (we are not expert programmers), also we realize that some of the problems we may have had were due to using 8bpc and not 11bpc as roymacdonald suggested, so to keep it fair I made this test using both methods using 8bpc. If you wish to review the code and try it out yourself of course that is fine. =D

So here come the explanations for the images. I tried to have the best test settings I could and I also tried to be very fair with the variables since they are slightly different for each other (Method 1 only has one Threshold value and Method 2 has two) so pardon me if there are errors in my test methodologies. The one element that I used to compair both methods is the hand blob so the threshold values are set accordingly in order to make the hand look identical with both methods. Both methods were captured at the same time of course.

Capture 1A and 1B
This is the only image in which the values are not set with reference to the hand (since there is non) and as a result the only one that I would never use as proof. This one is just for explaining the difference in the result of the methods. It even looks like one of the methods gives the opposite blobs but that is not the case (top left corner).

Capture 2A and 2B
I used the settings I could to get this image. Notice that the hand is pixel perfect on both cases (I checked). Now notice the noise in 2A that turns into three unwanted blobs on the findContours. Method 2 has noise but it is not enough to create a blob.

Capture 3A and 3B
In this one I shifted the settings while trying to have the hand look almost identical (which is harder since there is more noise). Notice that Method 3A gives more noise and two bigger unwanted blobs. Method 2 has noise but it still cannot make a unwanted blob.

Capture 4A and 4B
Just another demo of noise.

I thought I might make this one last thing. You may already know this but since you brought it up before I might as well mention it.
I’ve attached a capture from another infrared camera, in it the kinect is looking directly at a very common LCD screen. The LCD normally emits infrarred light but contrary to what I mentioned before, I believe the kinect cannot read well because of the reflection of the infrared laser (as roymacdonald). But I’m also thinking that one shouldn’t worry about it because in the event of a person touching the screen with their hand, they would cover part of the reflection and work just as well.

By the way, the first post on this thread has been reviewed and changed as suggested. :stuck_out_tongue:


Excellent post! I am sure that many developers will find this useful.

I am also going to do some tests… specially on the LCDs, soon. :smiley: Will update as soon as I have something to share.

Again, awesome work!

A quick followup.
I’ve managed to port my code from ofxKinectNui over to ofxOpenNI (experimental branch) on OS X Lion. The purpose of this was to not be dependant on Windows operating system and also to read Raw Depth data from the kinect at 640x480. This was in OS X Lion with openFrameworks 0071 Release from the homepage (not github).

Since the Kinect Depth image is 11bpc I shouldn’t use ofPixels or ofImage (like roymacdonald suggested) but I had a some trouble trying to understand what was happening with ofShortPixels() and ofShortImage because when you draw() them they come out wrong. It seems there is bug and I found out the hard way (almost no info about it here: http://forum.openframeworks.cc/t/ofshortpixels-doesn’t-draw-pixels-correctly/9790/0 )

Anyways, the propper procedure is:
Make two ofShortPixels (640x480, OF_IMAGE_GRAYSCALE) and fill them with openNIDevice.getDepthRawPixels(). One of the ofShortPixels will be only used as a background subtraction.
Now, forget about rendering these two inside your application, it won’t work. Instead you have to code your application blindly (trust me, the correct data is inside there, depth values from around 500 to 4000, millimeters(?) I think).
Make an ‘if’ that compares these two ofShortPixels and fill another regular ofPixels with 0 or 255 values and this is the one that will end up in ofxOpenCv.
Everything else is just like the rest of the code I shared in this thread.

Another tip I can give you. If you do wish to be able to see the depth data that you are about to process, what I did was a procedure that ported my 11bpc values over to 8bpc, using 5000 as a max. depth value. Nevermind that ofShortPixels is 16bits, you only want each pixel’s integer values. There was no little to no hit on performance.

That is about as much info as I have and I’m gonna change topics and rarely read this thread. :stuck_out_tongue:
OP has delivered! (too much reddit lately).
Good luck everyone!

I know that this topic is pretty old, but I have problem with solution that was presented here. I understood all of steps to get information about touch, from Kinect or other PrimeSense sensor (in my case Asus XTion).
The most important thing here is to get 3d image of surface. After this calibration we can detect which pixels are “disturbed” and then runs all OpenCV magic.
But Kinect sometimes goes crazy with vertical bands. The reference 3d image of surface differs diametrically - here is example: https://www.dropbox.com/s/vw0bb9iyhw831le/touch-mask.png . I know that I can filter this white region by size, but still this part of surface is untouchable.
Increasing number of calibration frames not always helps.
Any ideas ?

ok, I don’t understand one step in irregular’s example code - image thresholding with OpenCV; line 24, method update. grayImage image data is from grayPixels, which can have only 2 values: 1000 and 0. I thought thresholding is doing in this for loop (lines 13-21).