Hi!
I’m working on a project these days where we are gonna do realtime tracking of people in a tight space, with a low roof and moving obstacles (we will be moving these obstacles so we now where they are at all times with a precision of ~10 mm). I have had success with the normal depth map -> threshold -> blob-tracking approach before but this time around we will have to go for a more sophisticated approach.
I have done some research already (mostly theory) and I want to share what I’ve learned so far and hopefully gain some from you girls and guys before we start reinventing the wheel over here. It would be great if this thread could turn in to a one-stop resource for multiple kinect setups with openFrameworks. (I have numbered the places in my post where I actually have questions.)
Hardware
We are planning to use four “kinect for windows” devices, as this project is already windows-specific and my experience with using xbox kinects and libfreenect on windows is not the best. Basically it’s proven to be quite unstable (Slow startup, several reconnections before receiving data, random crashes). It would be great to hear if anyone has tips for improved stability, though we are already set with using the official drivers for this particular project.
I have read several posts describing issues with connecting multiple kinects to one computer and it seems that the number of kinects that can be connected is directly linked to the number of internal usb buses on the system. I have seen claims that one bus can manage two kinect streams, while others claim you will need a dedicated bus pr. unit. Can anyone confirm?
To solve this issue we are looking at getting a quad-bus usb card like this: http://www.unibrain.com/products/dual-bus-usb-3-0-pci-express-adapter/
It is a USB 3.0 card but it claims to have “Legacy connectivity support for USB 2.0”. This might or might not be an issue. Does anyone have an experience with this? If some of you have a working set up with four kinects on a windows machine, do you mind sharing some details on the hardware-setup?
Software
Our plan is to use (and maybe also contribute?) to the ofxKinectCommonBridge add-on. It says it only supports 32-bit windows for now and we are gonna run 64-bit windows. James George mentioned that an update for it was on the way. If so, what’s holding it back? Maybe we can help?
I’m also missing a method for converting from depth-map coordinates to 3d coordinates like the getWorldCoordinateAt() method from ofxKinect. Is there an equivalent function in the official kinect SDK or do we need to implement this ourselves? If so, does anyone have any experience with what implementations yields the best approximation? (There’s one implementation described in here http://openkinect.org/wiki/Imaging_Information but I haven’t tried it myself yet).
Set up & Calibration
Our plan is to place one kinect in each corner of a room facing inwards, basically trying to cover as much of the room’s volume as possible. By effectively extracting point clouds from each of the kinects and projecting them in to the same coordinate space we are hoping reconstruct the geometry of people in the room as accurately as possible, while avoiding the normal “shadowing” effect you get from only using one kinect. (We will also have another layer of sensory information coming from a capitative floor but I’m not gonna include that in this discussion).
As the installation will be exhibited at at leas two different locations we will have to make some sort of calibration routine for the positioning and aligning of the kinects. Here I’m quite inexperienced and would love to get some input on the best approach? Do we physically measure and input the position and angles of the kinects in relation to each other? Do we do some sort of reference object calibration routine (checkerboard or similar?). Has anyone tried a ICP approach? (http://en.wikipedia.org/wiki/Iterative_closest_point)
I have done some tests with overlapping kinects and to my surprise the interference is not that bad. If it becomes a problem we might consider trying the vibration trick (see page 6 in this paper http://www.matthiaskronlachner.com/wp-content/uploads/2013/01/2013-01-07-Kronlachner-Kinect.pdf) or go to the extreme of implementing a way of triggering the readings in sequence. (pages https://isas.uka.de/Publikationen/IROS12_Faion.pdf). We are hoping to avoid this by trying to avoid pointing the kinects directly at each other.
Tracking & predicting movement
The information we are mostly interested in is the position, height and roughly the volume of each person in the room. I haven’t worked much with point cloud data in this sense before (except performing a brute force distance calculation on all the points to find the closes one) so any pointers you guys have on how to work with this data in a effective way is very welcome. I’m already looking at PLC (http://www.pointclouds.org) for working with the point cloud data and I got tipped that there’s already an ofxPLC out there so we will probably look into that. Another option I’m thinking about is generating some kind of height map based on all the data and perform normal opencv based blob tracking. That will probably require quite a lot of cleverness to avoid . Any takers?
We will also have to try to predict where people will be in about 500ms. I’m thinking an approximation based on the velocity/trajectory of each person will do? If anyone have experience with anything similar it would be awesome to get some details!
It would be amazing if some of you have some input! I’ll try to keep this thread updated as we make discoveries that can be valuable to others.
Thanks!
-Bjørn