Crowd density tracking with computer vision

I am working on a crowd controlled soundsystem for a music festival. Music would be controlled by individuals and the crowd as a whole

While searching for crowd tracking techniques, I stumbled upon this one http://www.mikelrodriguez.com/crowd-analysis/#density; Matlab code and dataset are enclosed. Are you aware of similar techniques, maybe simpler, based eg on blob detection? Do you have an idea about how well this one would perform in a real-time scenario?

It appears that it might work alright in real-time, but I’m wondering if you could give us a bit more information about the kinds of “motion” you want (and expect) to get from the crowd. What is the camera position? Are the tracked people in transit through the space, or are they mostly standing in the same place watching performers? The algorithms you cited are good at creating crowd density models for crows in transit – but for a more “static” crowd – I’m not sure that it would be the best approach.

If you’re looking for more of a meta movement map, you might look into a dense optical flow tracker like ofxOpticalFlowFarneback. Then you could get the “pulse” of the crowd and generate a density map. Depending on the lighting situation though, you may end up just tracking spotlights on the audience … So in that case you might use an IR camera approach. Anyway, once you get a sense of the movement density from an optical flow approach, you could use it to bootstrap/seed a haar face tracker which could in turn seed an ofxFaceTracker (if you have good enough lighting and camera resolution), which would give you some micro-level tracking options … Anyway … more info about your particular situation might help :slight_smile:

Cheers for the answer.
We are developing the concept as we go exploring technologies, but the idea is that the music would be controlled by the crowd as a whole. The performer is the crowd here. It would be for 500+ people so individual face tracking is not an option if only one camera is used for that (from a top view). I am not to sure about the lightning conditions and what kind of camera should be used just yet.

We also want to people to be able to interact with the sounds on an individual scale. An option that was suggested was for example to use estimotes that allow to track spatial position of smartphones using low-power bluetooth. Smartphones sensors could be used to generate movement data also. I see a big problem of scalability here though, on the WLAN router for receiving phone senosr data side, and also a compatibility problem with older devices with low-power bluetooth. That’s why I wanted to learn about CV techniques

1 Like

rgb-d dataset