How to map coordinates between different 3d spaces?

I have a question about mapping coordinates between 3d and 2d space.
This stems from using the ofxKinectForWindows2 addon, but is really a general question about how to map coordinates (possibly using ofMatrix4x4?).

I have a set of 3D points (skeleton tracking coordinates) that correspond to physical meters, where the Kinect sensor is the origin. I also have the corresponding 2D points of the skeleton in pixel dimensions for the Kinect color image.

What I would like to do is draw the Kinect color image as a plane in a 3D openFrameworks scene (i.e. using ofEasyCam), and then draw the skeleton tracking points in 3D on top of that scene - in positions that correspond to the camera’s viewpoint of the color image.

I’m guessing this requires matrix math? Is this homography? Basically I want to map the 3D skeleton coordinates from meters in space to a specific 3D camera space.

I understand you want to project the image plane to 3D space, and be able to map 2D points of the image onto that 3D plane.

This is basically the screenToWorld method from ofCamera. The underlying matrices are the view and projection matrices of the camera, that move from 2D to 3D using the position and orientation of the camera, as well as the focal distance of the image plane to the optical center.

The issue with screenToWorld is that is uses OpenGL depth values for the Z component in the screen ofVec3f. This depth relates to the near and far planes of the camera. I had lying around a method to convert distance to depth, I haven’t used it for a while, but I remember it worked.

static float distanceToDepth(float distance, float nearClip, float farClip)
{
	// https://www.opengl.org/discussion_boards/showthread.php/154989-How-to-get-the-real-depth-value
	// dist = (NEAR_PLANE*FAR_PLANE/(NEAR_PLANE-FAR_PLANE))/(zbuffer[ii][iii]-FAR_PLANE/(FAR_PLANE-NEAR_PLANE));
	float nearXfar = nearClip*farClip;
	float nearMfar = nearClip - farClip;
	float farMnear = -nearMfar;
	float A = nearXfar/nearMfar;
	float B = farClip / farMnear;
	float depth = (A / distance) + B;
	return depth;
}

Hi Hennio - Thanks so much for the handy function, and explanation about screenToWorld depth.
It’s not exactly what I’m going for, but could come in handy.

Here’s what I’m trying to do, hopefully more clearly this time (my initial post was confusing):

I have:

  • a set of skeleton tracking coordinates in 3D. The coordinates are in meters - so for instance, pt(0,1,2) is centered on x, 1 meter up, and 2 meters away from the origin, which represents the Kinect depth sensor location.

  • corresponding skeleton points in X,Y coordinates overlayed on a 2D color image of the scene, also generated by the kinect

  • an ofCamera at the origin, and the 2d color image drawn at a distance that makes it fill the camera view

I want to find:
the scale/translation/orientation I need to perform on the 3D skeleton points to line them up corresponding to the 2d skeleton points - so that, when viewed from the ofCamera at the origin, they line up perfectly. But when viewed from another angle, it would be clear that they have depth.

Basically, the Kinect SDK allows me to take the 3D skeleton coordinates in physical space and translate those to 2D coordinates on the color image. But I lose the depth in that process. I’m trying to get the 2D coords but keep the depth.

I see!

I will expand on the answer, my first answer was too brief.

These set of points correspond to the 3D skeleton points, but viewed from the camera’s point of view. If you “project” these points on 3D space using the depth from the Kinect (“2 meters away from the origin”), then you will have the skeleton points in 3D.

To project we use screenToWorld, which gets 2D points in the screen (“ofCamera at origin”) and projects them to the 3D world (“it would be clear that they have depth”).

Here is a diagram for better comprehension ^^’’

We create an ofVec3f for each skeleton coordinate. XY from the screen coords (in pixels), and the Z is the kinect depth converted to opengl depth with the method posted before. We input that screen coord (with depth) to the screenToWorld method of the ofCamera representing the kinect position, and voila, we get the corresponding depth for that pixel (skeleton coord)

I hope I got it right this time :slight_smile:

Oh that makes sense now!
When you wrote screenToWorld() I read it as worldToScreen(), so thought distanceToDepth() was for converting the z from that output - then I was confused. This really clarified it, and I will give that a shot today! Thanks so much.