Merging point clouds from multiple kinect v2's


I’m working on a prototype for an installation (project for uni) where a user will be tracked in an open space. The user will have to navigate through a simple maze blindfolded and will receive feedback through vibrations to find the way.

To track the user I was thinking of using 2 kinects (or more if I can get my hands on a mac with enough usb busses) so I can overlap a little and cover a bigger space. I’ve got ofxMultiKinectv2 working and read some stuff (including Multiple Kinect setup for real time volumetric reconstruction and tracking of people) on how to go about merging views from multiple kinects, where a variety of ways are mention.

I’ve had a go at merging the point clouds with ICP, but have failed so far to get ofxPCL to work on my machine. Using AR markers seems a good approach to me as it’s fairly simple to perform once programmed. However I’ve hit a wall figuring out how I derive the matrix needed to place the point clouds in the same coordinate space, making evident that my knowledge of graphics math/opengl is falling short.

So how would you derive this from the ModelView matrix? As a first step I’ve tried to first mimic the camera movement relative to an AR marker, but playing around in the ofxAruco example I couldn’t get further than the rotation and movement, not the position. Have I mixed up different kind of matrices or is there some more maths involved?

Also I came across homography in openCV, am I correct to think that if I derived a homography matrix and applied it to a pointcloud it wouldn’t work correctly as it’s a technique for 2D?

Any help in understanding how this works would be much appreciated!

Thanks in advance!

1 Like

I experimented before with openCV to make some sort of auto calibration by using a checkboard test pattern and figuring out the camera transformations related to each other cam, using this I just transformed the point cloud. It kinda worked but it was not precise so at the end it would always need some manual adjustment.

I just recently merged 3 kinects into a single point cloud for an express commercial project. As I knew that the openCV stuff I tried before was not fail proof I just went for the less elegant solution of aligning the point clouds manually.
I simply used ofxManipulator to transform the pointclouds until aligned. It was not perfect but it was OK, and more importantly I was able to deliver on time. Maybe if i’ve have had more time I would have gone for a completely automated method.


1 Like

I did this once. First I manually aligned two frames in MeshLab –– to create a first approximation for the PCL’s ICP. Then ran ICP to finish the alignment off. I could not get the latest PCL compile, but a previous version worked.

I did this for one frame, since the cameras’ positions were fixed, then applied the transforms in real time.

As far as opencv is concerned, I think there are a lot of tutorials and blog posts about setting up stereo vision, the latter requires both cameras calibrated and figuring out their relative positions and orientations.


1 Like

I’ve worked with up to six kinects on a PC, but I have found that the Belkin box is a good way to increase the number usable on a mac laptop, as long as your computer has a thunderbolt port (I was working with three kinects on a mac mini, the internal usb bus could only handle two, but the belkin allowed three) All of my setups were with the original xbox kinect. The v2 is the xbox one kinect, right? I’m not sure if the data stream is as controllable as the original xBox.

I built a simple example for moving pointclouds to align two kinects. If you are needing complete accuracy it may not work well, but it worked for me. It’s far more elegant than the system I used in my thesis. You can find the update here:

Some of the questions I was asking regarding this were answered in this forum post: Rotate of node around a specific point

If you are using a top-down view of the point cloud to analyze how someone is facing, and directing them accordingly, you may find working directly with the point cloud will get faster results than analyzing an open cv image from the point cloud as displayed on the screen.

I also found that, while the highest resolution was nice for graphics used in presentations, it was unnecessary for finding basic shapes or directions. I also found that using ortho projection gave a better result than the camera view. But then, I have peculiar views for what the Kinect is from most people.

BTW I was unable to get PCL to work on my macs either. I know one guy who did, but it took him a month and I think he lost twenty pounds due to stress, and he couldn’t really describe how it worked afterwards.

1 Like

Thank you for sharing your approaches. For now I’ve resorted to using ofxManipulator to align the pointclouds which seems to work well enough for now. After this project is done I’ll see if I can get a more automated solution to work with ICP.

nosarious: yes, it’s the kinect one. I have been trying to generate a height map and analyse that with openCV contourfinder, but it’s giving me some issues when converting the images. How would you directly analyse the pointcloud?

On a bit of a side-note; I have some more ideas that involve graphics programming/computer vision and what to learn more about how this works, what are good learning resources considered that I have only done a little of linear algebra? Is the book “Multiple View Geometry in Computer Vision” (Richard Hartly) worth looking at?

…I, uh, analyzed point clouds for my thesis because I didn’t know how to do what I have done in the example I linked to… Now that I know that I can build the views I would need to analyze in opencv. But I don’t need to anymore.

what kind of problems are you having with image conversion?

Is your thesis available somewhere? Would be interested to have a look.

I’m not sure, it was something to do with the conversion to a CV image or the contourfinder, resulting in bad access exceptions. Will see if I can find some time to replicate it in the next few days.

You can find it here:
where is every body

I included a video of the presentation which summarizes things here:

About 4:15 or so I describe how the combined point clouds are used to analyze where people are. I’m sure there are better methods, but I needed something quickly so I just built my own. Thankfully no one judged my code. Whew.

I think the problem I was having with openCV and frame rates was the resolution of the image I was analyzing (full sized screen), and possibly my inexperience with FBO’s that I was making for analysis.