hey i am stunned that it even detects something… i think at some point i played around with the dimensions and it didnt work at all. So i was also a little bit to pessimistic
So your approach might work if you know that in total you wont have more than 6 people in the merged image, e.g its 2 people and some cameras will never see the same content. Good thinking!
What about using a light model like nanodet to detect people, and pass the user area to movenet? in order to reduce to max the pixels sent to tf model. You can run the people detection on cpu and let the gpu for movenet
One other tip is to make sure you have the CUDA NN installed / enabled - that seems to have quite a big performance boost for some approaches. NVIDIA cuDNN | NVIDIA Developer
…and that the (pre)compiled libtensorflow being used supports CUDA on the system…
Absolutely. I was struggling a bit with TF/CUDA/cuDNN versions for a while before I was able to get anything running.
I’m going to make a PR for the README when I get everything working.
Also, if you’re on Windows, adding the following to the addon_config.mk saves a lot of time.
ADDON_LIBS += /libs/tensorflow/lib/tensorflow.lib
Oh, one more thought: presumably I can just throw more GPUs at the problem, no?
Is there a way to configure which model uses which GPU?
That’s an interesting idea. I have 2 questions:
- I was under the impression that this is how most pose estimation models already work: by finding all of the bounding boxes of things that roughly looks like a human and then only analyzing those pixels to find the rest of the features… But perhaps you are suggesting that whatever is doing that in MoveNet isn’t as efficient as nanodet?
- Is it available as a TF SavedModel? I checked https://tfhub.dev/ and https://modelzoo.co/
yes thats how pose model works, but key here is how many pxs did you sent the model to search, as the model is much more “hungry” than the pure detection. The less pixels you give it the faster should be.
Nanodet was just and example as light yolo… there are lots, and you dont need to do that inference in tf you can use for example openvino or ncnn (this one is my last favorite)
But this is just an idea, not tested myself, i used nanodet in a recent project and works ok even the provide coco one. You can do the test and if happy can train a new one only to people detection