Hm … not sure if I totally understand, but in this face tracker, the process is pretty simple:
- Detect all faces via bounding boxes (the detection can be configured to use the CPU-based HOG detector – the default – or the MMOD detector, which requires GPU for reasonable speeds).
- Then a really simple spatial analysis is performed to to make a best guess if a given bounding box detected this frame is the same as the one detected last frame. It does this by looking to see if the bounding boxes overlap or are close enough.
- If the tracker determines that the bounding boxes are the same face (purely based on spatial analysis / proximity) it assigns them an index.
- When a new index is assigned, the event appears in the
onTrackBegin
callback. When a known id is reassigned from a previous frame, onTrackUpdate
is called. When a track is lost, onTrackEnd
is called. There are a few parameters that allow the track to be lost for a few frames and it will pick up the face again if it’s in the same general location as it was.
To get to the question I think you are asking – in other face trackers (e.g. https://github.com/kylemcdonald/ofxFaceTracker), the process is:
- Detect a face using a haar detector.
- Then stop using the haar detector (it’s slow) and let the CLM tracker take over (this is fast). So, it’s a different architecture than the way it’s done in ofxDlib.
https://github.com/kylemcdonald/ofxFaceTracker2 , which also uses dlib, uses an approach like the one I use in ofxDlib (always detecting, then figuring out what bounding boxes belong to the same id).
In some future version, instead of doing tracking based on spatial characteristics alone, one could use the facecode
available in ofxDlib to do simple “recognition” based tracking.