Coordinate Transformation (from depth image or point cloud to surface)

Hi,

I have a depth image from a Kinect taken at an angle. I would like to have this data set as though the view were top-down. I think this is just a transformation from the actual POV (which is not known but can be derived from the data) to a virtual POV with the eye(s) directly above each point.

An OF user did this (or something similar) in the past and pointed to his code here: https://github.com/kylemcdonald/DohaInstallation/blob/master/addons/ofxVirtualCamera/ofxVirtualCamera.cpp

However, I’ve never coded OF before, my C/C++ is rusty, and this addon doesn’t seem to be supported (no example code, not install.xml, etc.)

Trying to dig through the API docs (I’m almost instantly lost) I did find RangeImage and it seems like this class might either be a place to start. I think inputs are point clouds not depth images, but if I can convert my depth image to a point cloud, then maybe RangeImage can do the transformation and output the desired result?

Any hints or advice on how to go about doing this would be much appreciated. I would like to perform this operation on tens of thousands of frames. I have no desire to display the results graphically, other code will take the output and do the analysis. It is just that the analysis can only be done if the data is acquired looking straight down.

Thanks,

-k.

Unless I’m misunderstanding, a lot of what you’re looking to do could be done by creating a mesh and rotating it? If that’s the case you can create an ofMesh instance, adding the vertices of all the points from the Kinect, and then either rotate the mesh, or using the ofCamera (or ofEasyCam) and set the view that way. Not sure if that’s helpful or just more confusing.

In point cloud form it is, I think, just a rotate. From the depth-image it is more than a rotate, because there are offsets between the pixels that turn into shadows, or multiple pixels compress to one, or even get removed. For example, if the original POV looked under an overhang, those pixels need to be removed when re-projecting from a top-down view.

I can understand in my head some of mathematical operations I need to do:

  1. Assume the camera is at (0,0,0) and looking, for example, horizontally to the north
  2. Convert the depth image to a point cloud, so I have 640*480 (307200) points, presumably 3 vectors X,Y, and Z
  3. Define the plane which is truly horizontal. Currently I do this by finding fitting a plane to the surface and minimizing the errors. In the future, a cup of water will be in the scene, where the water surface defines “horizontal”

Now it gets a bit fuzzy for me. I think the remaining steps are:

  1. For each point find the transformation from the actual POV to the virtual POV, normal to the horizontal plane, and above the point
  2. Perform the rotate (and translate?) (presumably with a(n) existing OF function(S))
  3. Find the projection of the points onto the new surface (deal with situations like walls and overhangs where multiple points get removed or averaged to one)
  4. Write out the new data

Ok, I get it. I think you want something like:

  
ofMatrix4x4 rot, trans;  
rot.makeRotationMatrix( 90, kinectPosition ); // rotate around the current position  
trans.makeTranslationMatrix( 0, yTrans, zTrans ); // move up and over  
  
ofVec3f newPoint = oldPoint * rot * trans;  

That just flips the point over though and won’t be insufficient if you want to account for the actual view and optics of the camera. I’m not sure I have much to suggest on the occlusion stuff unfortunately.