Kinect + OpenNI + OpenCV blob tracking performance

I am currently doing a project where I have to build a blobtracker using Kinect. The application has to log blob information and track people through a room.

The OpenNI scene analyzer provides some useful information, i.e. it can retrieve masked camera images for all users that are being tracked. I can input this masked image to OpenCV in order to find the blob information (contour, centroid, etc). Afterwards I can do my own analysis on those contours and store the necessary data. Up to this point it all seems pretty neat, but my biggest problem is with performance.

The OpenNI data generators run at a default of 30 fps and a resolution of 640x480. Is there any way to change this and what are the possible implications? I’ve tried altering the map output mode, but that seems to crash the application when I call the SetMapOutputMode() function during setup.

Furthermore, the OpenCV contour finder seems computationally heavy, cutting my framerate in half from 30 fps to 15 fps or even lower. For my application a minimum of 30 fps is desirable to ensure that I track people that are moving around.

How can I optimize my application for OpenNI/OpenCV? Or are there any other suggestions for my approach on blob tracking?

Related question: when OpenNI skeleton tracking is enabled, it is constantly looking for users standing in the ‘psi’ position to calibrate the skeleton. The only part I am interested in is the scene analyzer (as explained above), i.e. getting a person blob from the camera image without the skeleton tracking. But it seems the whole skeleton tracking needs to be enabled to get the blob info that I need. Is there any way to disable skeleton tracking/calibration and still get the blob information?

Another quick question: How does the center of mass function (from the OpenNI user generator) map to a coordinate for X and Y direction? My first thought was that it maps the center of a blob to [-resolution/2, resolution/2] for both x and y so the center of the camera image is at (0,0) and the top right corner coordinate is (320, 240) if the resolution is 640x480.

But when I try it out, the mapping seems to work a bit different. The top image is correct but the bottom one is positioned outside the blob (I remapped the center of mass to [0,640] and [0,480] for the (x,y) coordinate).

Edit: Ok, I fixed the problem. The UserGenerator::GetCoM() returned a real world coordinate. I needed to project it to screen coordinates using a function from the DepthGenerator:

  
  
XnPoint3D com;  
recordUser.getXnUserGenerator().GetCoM(i, com);  
recordDepth.getXnDepthGenerator().ConvertRealWorldToProjective(1, &com, &com);  
  
int x = com.X;  
int y = com.Y;  
  

First of all, are you using generator#IsNewDataAvailable? It can release some substantial cycles for your opencv stuff. Next obvious thing to try is to separate kinect and openni into own threads.

–8