So I’m working on some tracking using an overhead kinect camera (shown above) on to a big screen. I want to work out the position of peoples hands touching/pointing at the screen, in relation to their hands on the screen for interaction.
There is a setup phase where you extend your hand to the first point (screen top left), then it stores the blob X and depth, then repeat 3 times (top right, bottom left, bottom right). This gives me horizontal boundaries min x and max x for the top and bottom and a min/mac depth.
Then each frame I take a blob, use its x position turning it into a 0 to 1 value based on its position within the bottom of screen range, then using the depth range to control the y position on screen.
This works, but because the camera is overhead, the perspective of it looking down on the screen means the top of the screen is almost double the width in pixels on camera than the bottom of the screen.
This means at the bottom the tracking works fine, but as you move your hand upwards in a straight line, the point on screen moves upwards as expected but moves out away from the centre, because of the perspective.
If the camera was further away, I would normally see the whole screen, use a boxwithCorners addon to set 4 points in the image and unwrap the image and do blobs on just that section, but as the camera is close to the wall, it doesn’t see very much of the screen.
So I end up with
y = 0.0 - 1.0 (based on depth range)
x = 0.0 - 1.0 (based on x pixel position within range)
I’ve tried various things like taking the ratio between bottom and top of screen as scale factor, multiplying it by depth of your hand and moving the x position as you move up based on that, but it doesn’t work.
Does anyone know a better way to do this? Should I be using quad matrix transformations? I’m not very familiar with them, so not got my head around it yet.