Acquire 3d position of object tracked with stereo cameras

i have a fast moving object tracking it with two cameras. (pingpong ball)
How can i obtain the 3d position of my object from this ?
im seeing around the process behind making disparity maps with stereo cameras , (have no personal experience)
but was looking for a cheaper way to do it since its a single object and the frame rate needs to be high,

if anyone has a better understanding of the issue could you please point me to some directions would be appreciated.

1 Like


If you are using stereo cameras then disparity maps is the way to go. Depth cameras (like Kinect) would be an alternative,.

To start on disparity maps here is a good guide using OpenCV ( OF has a nice wrapper ) :

Then I suggest that, as you are only interested in the pingpong ball, you run an object detection algorithm to compute the disparity map only on the pixels for the object. If the background is still then a background / foreground detection + color detection for the ping pong ball will do.

thanks for reply,
am currently testing matlabs stereo calibrations that seem to be more straightforward.

wondering if theres any explanations of the ofx process online that i cant find.

have a few questions if anyones had previous experience.

  1. Could i avoid the disparity map calculation and just run a triangulation on the two x,y of the tracked points on the two cams ? (assuming they have been rectified.)
    is there any triangulate functions in the library?

  2. Would it in the end be worth it , in terms of fps and accuracy ? meaning if the calculations are heavy and bring me down to 30 fps i might as well go with a kinect.



If your environment allows it (i.e no direct sunlight, etc) it’ll almost always be worth going with a Kinect or similar depth camera and save yourself a lot of work and most likely get a higher quality depth map out of it. I’ve found stereo disparity maps to be pretty noisy in the past.

I would first test with a fronto-parallel setup (a pair of same cameras facing forward, just like human eyes). Then all you need is to calibrate the focal length f and disparity d (i.e., distance between the cameras). Apply blob detection on both cameras to find the ball, and let’s say you found the ball center at pixel (xl, yl) and (xr, yr) assuming the origin is at the camera center. Then the ball position can be solved from xl / f = X / Z and xr / f = (X-d) / Z (and yl / f = Y / Z). It’s just a simple geometry.

But if you’re not sure about the camera geometry, give it a try with Kinect first.

thanks a lot for your reply’s ,
@hahakid good to hear from someone with experience,
the kinect is a bit slow for my needs but might end up using it.

@micuat interesting,what doesnt make sense to me is, if X and Z is in world units (meters) , is xl in normalized screen coordinates ? as in 0 to 1 from left to right ?


xl, yl, xr, yr don’t have to be normalized but the image center has to be origin, e.g., -512 < x < 512 and -384 < y < 384. X, Y, Z has the same unit as disparity d. Perhaps this article helps:

But probably you need some optimization to apply blob detection at 60fps or more. For example, you can limit the search window by predicting the ball motion.

Hi, did u achieve success in getting the pose of the tracked object?
I think you can get the 3d position using PCL through disparity map after blob detection. For better depth use Intel Real Sense or ZED