A couple of my students encountered an issue with a slow app, and I’m having trouble figuring out what’s going on.
They are trying to port this Processing sketch that renders metaballs by directly manipulating the pixel values of an image: https://github.com/CodingTrain/website/tree/master/CodingChallenges/CC_028_MetaBalls/Processing/CC_028_MetaBalls
This runs at about 60 fps on my PC.
We ported this to OF and it runs at about 5 fps in Debug, and 40 fps in Release, which I find really surprising. And adding any more blobs or increasing the image size basically just breaks down.
Here’s a gist: https://gist.github.com/prisonerjohn/730de0153eb6cc963d34f0a1abfc6546
Some of the things I’ve tried so far:
- Calculating distance manually instead of using
glm helps but not by much.
- Disabling mipmaps doesn’t help.
- Ping-ponging the textures for drawing doesn’t help.
We tested this on both Mac/Xcode and Windows/VS.
I’m a bit stumped, wondering if anyone has any ideas? Thanks in advance!
Running your code in Release gives me about 80 fps. Just by making some function calls outside of the loop, i get around 155fps.
Here is the code:
auto pixels = canvas.getPixels().getData();
auto width = ofGetWidth();
auto height = ofGetHeight();
for (int x = 0; x < width; x++)
for (int y = 0; y < height; y++)
int index = x + y * width;
glm::vec2 vec(x, y);
float sum = 0;
for (Blob& b : blobs)
sum += 10 * b.r / glm::distance(vec, b.pos);
pixels[index] = min(255.0f, sum);
Also, you will get A LOT more fps if you do this in a shader.
I have found calls to ofGetWidth() ofGetHeight() and ofGetElapsedTimef() to be a little slow, which can be painful inside of for loops – perhaps you can look there first? Looks like that’s something marco91i optimized.
also on osx, you can use the time profiler in Xcode to see what your code is spending time on – it’s usually how I can suss out slow function calls.
Amazing, thank you @marco_v, I had never realized that calls to
ofGetHeight() were so expensive. That seems to do the trick.
FWIW, the first thing I thought too was to just do this in a shader but that’s not covered in this course so I wanted to keep it all CPU bound.
@zach Thanks for the tip! I did a bit of profiling in VS but it kept pointing to the
glm::distance call as the culprit… Maybe I’m misreading it.
@prisonerjohn Im doing teaching with openFrameworks as well. I sometimes do a pretty similar exercise and after that im letting them do the same with a shader, just to see the performance increase.
I got curious with your example and just wrote a shader that does the same - and i get around 3500fps . In case youre interested, heres the code:
for (int i = 0; i < blobs.size() * 3; i+=3)
v[i] = blobs[i].pos.x;
v[i+1] = blobs[i].pos.y;
v[i+2] = blobs[i].r;
shader.setUniform3fv("blobs", v, 10);
ofDrawRectangle(0, 0, ofGetWidth(), ofGetHeight());
// fragment shader
uniform vec3 blobs;
out vec4 outputColor;
float color = 0.0;
float sum = 0.0;
for(int i = 0; i < 10; ++i)
vec2 blobPos = vec2(blobs[i].x, blobs[i].y);
vec2 fragPos = vec2(gl_FragCoord.x, gl_FragCoord.y);
sum += 10 * blobs[i].z / distance(blobPos, fragPos);
color = min(255.0, sum) / 255.0;
outputColor = vec4(color, color, color, 1.0);
i would also swap the x and y for loops so you iterate pixels one after another as they are laid out in memory. as it is you are jumping in memory which can decrease speed a lot depending on what you are doing inside the loop