I’ve ported an old project which is essentially ripped from the opencv examples on doing feature-based object detection. I think it is a fairly standard/powerful approach and opencv makes it very simple to use/experiment with many similar algorithms using the templated feature/descriptor/matcher routines.
It can be used for quite interesting applications other than just detection of course. I have been using it for working on resynthesis algorithms where I use object detection together with a database of objects to resynthesize video - concatentative synthesis meets video. Regurgitating Svankmajer’s “Dimensions of Dialogues” after having eaten Svankmajer’s “Food”:
So for the Dimensions video you posted:
You analysed Jan Svankmajer’s “Food” and took all of the objects out of that film and put it into a database, then analysed “Dimensions of Dialog” and recreated it using similar looking objects from the database you made from Food?
It’s very fitting to the themes he explores in his own work, very cool project!
I bet it did take a long time. How, if it is an automated process, does it look at a frame and say “this is an object”? or are you looking at the movie to be recreated first and then scouring the imagery source video for a good match for certain pixel blobs?
Also, if you don’t mind a bit of critique, I think the resulting video would have a much stronger impact if you could leave some of the images on screen for a longer time, as it is now they seem to be replaced every frame, so it is difficult to tell what they were originally from. Maybe the compression is killing it too.
Both actually. The first pass is the creation of the database, where it finds objects in each frame. The second pass is the resynthesis, again finding objects, but matching them to those in its database.
Excellent point and one I am working on currently. This essentially entails finding video segments in 3d and I have a few ideas for this that should make the whole process much faster and perceptually more continuous.
Yes! I have yet to upload any preliminary results but the idea is simple. Rather than doing the flann-based vector-vector matching, I do sequence matching and concatenate vectors together to create a spatial-temporal descriptor.
What is the size of the image you are running on? You may find reducing the image size is an easy way to speed up the algorithm at not too much cost to the performance. I was running on a 320x240 image. You could also try using “FAST” instead of “DynamicSURF” for the FeatureDetector (see inside pkmDetector’s constructor) although the results will be less stable. As well you could try “NONE_FILTER” instead of “CROSS_CHECK_FILTER”, though again the results will be less stable.