Kinect tracking, unwrap blob points using matrix?

So I’m working on some tracking using an overhead kinect camera (shown above) on to a big screen. I want to work out the position of peoples hands touching/pointing at the screen, in relation to their hands on the screen for interaction.

There is a setup phase where you extend your hand to the first point (screen top left), then it stores the blob X and depth, then repeat 3 times (top right, bottom left, bottom right). This gives me horizontal boundaries min x and max x for the top and bottom and a min/mac depth.

Then each frame I take a blob, use its x position turning it into a 0 to 1 value based on its position within the bottom of screen range, then using the depth range to control the y position on screen.

This works, but because the camera is overhead, the perspective of it looking down on the screen means the top of the screen is almost double the width in pixels on camera than the bottom of the screen.

This means at the bottom the tracking works fine, but as you move your hand upwards in a straight line, the point on screen moves upwards as expected but moves out away from the centre, because of the perspective.

If the camera was further away, I would normally see the whole screen, use a boxwithCorners addon to set 4 points in the image and unwrap the image and do blobs on just that section, but as the camera is close to the wall, it doesn’t see very much of the screen.

So I end up with
y = 0.0 - 1.0 (based on depth range)
x = 0.0 - 1.0 (based on x pixel position within range)

I’ve tried various things like taking the ratio between bottom and top of screen as scale factor, multiplying it by depth of your hand and moving the x position as you move up based on that, but it doesn’t work.

Does anyone know a better way to do this? Should I be using quad matrix transformations? I’m not very familiar with them, so not got my head around it yet.

Many thanks

Hey Chris, curious did you find a way to solve this?

I think your best bet will be using two Kinects, one above and one at the bottom and use the data from one of them depending where the hands are.


no not solved it yet, its definitely possible with one kinect, its probably just about calculating a transformation matrix but i’ve not used them before :frowning:


Hi Chris!
I had a moment of fun in R ( and found that, assuming that Y goes from 0 (bottom) to 1 (top), which is the case, and that the quadrilateral is symmetrical respect X=0, to warp and unwarp at the top this formula should work fine:

X’ = X*(1f-Y*warp_factor))

warp_factor can be set between -1 and 1 and setting it to negative will actually “unwarp”

If you don’t have R you can run the following R script and try changing the values at

warp_factor = 0.33 # set this to negative to unwrap

#draw a thing that looks like a square on XY

x=x*(1-(y*warp_factor)) # new x warping formula


The 4 calibration points might not be defining the bounds as you intend. The top left and top right will have roughly the same Z value, but the top centre (closest position of your screen to the camera) will have a closer Z value. This is probably what’s causing the drag as you move up/down.

You could do a quick test with your calibration method on one side of the screen- eg. top left and bottom left remain the same, but top right and bottom right will be in the centre (directly down the line of the camera). Then subtract .5 from the X value, and see if your software is working on the left side of the screen.

Hi Chris,
Do u solve the problem now? What kind of way did u used for tracking the bolb of hands?
looking forward.

An idea would be to iterate over the pixels in the area you are interested (or just use the centroid of the blob?) in and project those points into 3d space based on the the projection matrix of the kinect camera. Luckily ofxKinect makes this procedure really simple by the getWorldCoordinateAt(); method. Basically it it takes a 2d coordinate from the color stream and converts it to a 3d coordinate. That way you will get the real x, y coordinates (rleative to the kinects position of course) in stead of the perspective projected coordinates you get from the depth image. Just remember to call kinect.setRegistration(true); after initializing so that the depth and rgb streams are aligned.