Optimization Tips

Hey, I’m trying to optimize the speed of this reaction diffusion program, currently it’s running at about 2Hz. Any tips would be appreciated!

#include "ofApp.h"

class Cell {
public:
	float a;
	float b;

	Cell(float a_, float b_) {
		a = a_;
		b = b_;
	}

	virtual Cell* Clone()
	{
		return new Cell(*this);
	}
};

std::vector<std::vector<Cell*>> grid;
std::vector<std::vector<Cell*>> prevGrid;

ofImage img;

float appWidth;
float appHeight;
//--------------------------------------------------------------
void ofApp::setup(){
	ofSetFrameRate(60);
	
	appWidth = ofGetWidth();
	appHeight = ofGetHeight();
	
	//img.allocate(ofGetWidth(), ofGetHeight(), OF_IMAGE_GRAYSCALE);

	grid.resize(ofGetWidth());
	prevGrid.resize(ofGetWidth());
	for (int i = 0; i < ofGetWidth(); i++) {
		grid[i].resize(ofGetHeight());
		prevGrid[i].resize(ofGetHeight());
		for (int j = 0; j < ofGetHeight(); j++) {
			float a = 1;
			float b = 0;
			grid[i][j] = new Cell(a, b);
			prevGrid[i][j] = new Cell(a, b);
		}
	}

	for (int n = 0; n < 10; n++) {
		int startx = (int)ofRandom(20, ofGetWidth() - 20);
		int starty = (int)ofRandom(20, ofGetHeight() - 20);

		for (int i = startx; i < startx + 10; i++) {
			for (int j = starty; j < starty + 10; j++) {
				float a = 1;
				float b = 1;
				grid[i][j] = new Cell(a, b);
				prevGrid[i][j] = new Cell(a, b);
			}
		}
	}
}

float dA = 1.0;
float dB = 0.5;
float feed = 0.055;
float k = 0.062;

//--------------------------------------------------------------
void updateGrid() {
	for (int i = 1; i < appWidth - 1; i++) {
		for (int j = 1; j < appHeight - 1; j++) {
			Cell* spot = prevGrid[i][j];
			Cell* newspot = grid[i][j];

			float a = spot->a;
			float b = spot->b;

			float laplaceA = 0;
			laplaceA += a * -1;
			laplaceA += prevGrid[i + 1][j]->a*0.2;
			laplaceA += prevGrid[i - 1][j]->a*0.2;
			laplaceA += prevGrid[i][j + 1]->a*0.2;
			laplaceA += prevGrid[i][j - 1]->a*0.2;
			laplaceA += prevGrid[i - 1][j - 1]->a*0.05;
			laplaceA += prevGrid[i + 1][j - 1]->a*0.05;
			laplaceA += prevGrid[i - 1][j + 1]->a*0.05;
			laplaceA += prevGrid[i + 1][j + 1]->a*0.05;

			float laplaceB = 0;
			laplaceB += b * -1;
			laplaceB += prevGrid[i + 1][j]->b*0.2;
			laplaceB += prevGrid[i - 1][j]->b*0.2;
			laplaceB += prevGrid[i][j + 1]->b*0.2;
			laplaceB += prevGrid[i][j - 1]->b*0.2;
			laplaceB += prevGrid[i - 1][j - 1]->b*0.05;
			laplaceB += prevGrid[i + 1][j - 1]->b*0.05;
			laplaceB += prevGrid[i - 1][j + 1]->b*0.05;
			laplaceB += prevGrid[i + 1][j + 1]->b*0.05;

			newspot->a = a + (dA*laplaceA - a * b*b + feed * (1 - a)) * 1;
			newspot->b = b + (dB*laplaceB + a * b*b - (k + feed)*b) * 1;

			newspot->a = ofClamp(newspot->a, 0, 1);
			newspot->b = ofClamp(newspot->b, 0, 1);
		}
	}
}

void swap() {
	std::vector<std::vector<Cell*>> temp = prevGrid;
	prevGrid = grid;
	grid = temp;
}

//--------------------------------------------------------------
void ofApp::update() {
	for (int i = 0; i < 1; i++) {
		updateGrid();
		swap();
	}
}

//--------------------------------------------------------------
void ofApp::draw(){
	img.grabScreen(0, 0, appWidth, appHeight);
	ofPixels& pixels = img.getPixels();
	for (int i = 0; i < appWidth; i++) {
		for (int j = 0; j < appHeight; j++) {
			Cell* spot = grid[i][j];
			float a = spot->a;
			float b = spot->b;
			pixels.setColor(i,j, (a - b) * 255);
		}
	}
	img.update();
	img.draw(0, 0);
}

//--------------------------------------------------------------
void ofApp::keyPressed(int key){

}

//--------------------------------------------------------------
void ofApp::keyReleased(int key){

}
1 Like

Hey well there are some things in c++ and oF that can help with speeding up an application. Here are a few thoughts:

  1. Minimize copies of large objects and try to use references instead. Passing large objects to a function by copy can slow down the execution of the function because the memory for the copy must be allocated and then the copy must be made before the function can execute.
  2. Do all the updating in ofApp::update(), and use ofApp::draw() for drawing.
  3. See if you can use std::swap() on the vectors in your swap() function, instead of making copies of them.
  4. I didn’t see any .push_back() with the std::vectors, right? And they’re also not resized outside of ofApp::setup(), right? This is efficient because vectors will move around in memory if they grow, and that can slow things down quite a bit sometimes, especially when they’re big and have to be copied from one place to the next.
  5. You’ve used a lot of “new” with the pointers, but I don’t see any “delete”. So just keep an eye on that. C-style pointers can cause memory leaks. C++ uses smart pointers which may be an option, but there is some overhead in using them. And you may find you don’t need the pointers at all, and can just use the objects, especially if they are small.
  6. Avoid file io or memory-intensive things that happen every cycle. This would be something like loading an ofImage or allocating an ofFbo in every cycle of ofApp::update() for example.
  7. Can the application potentially run parallel code? There are a few ways to do this, depending. If you have multiple cores and threads available to use, the TBB library can be helpful. There is a recent forum thread about it here: Parallel for-loop . And then ofThread is super nice for things that can run independently in the background on their own threads that might otherwise hold up the execution of the application.
  8. Also I almost forgot that storing data in textures and using shaders to compute values can have a huge increase on performance. This might work really well here. Have a look at the gpuParticleSystemExample for how it uses textures for position and velocity data, which are updated by the gpu.
4 Likes

all good tips –

from your code, it looks like you are running a reaction diffusion algorithm that is sized to the window. You might see what the fame rate is like if you start with a smaller window by adjusting the size in main (in my experience when I did RDF on cpu, I usually simulated something like a 500x500 sized cell…). shader based RDF can be faster because it’s massively parallel.

one other thing I suggest is just commenting out parts of your code to see if it’s the simulation or the drawing that’s slow. Also, this is more advanced, but I will sometimes try a profiler (in Xcode, there’s a profiler called timer) that will give some timing info, you can see what functions are the slowest, etc.

2 Likes

Yeah measuring the timing can help a ton to identify bottlenecks where a little work might have a big payback. Sometimes I’ll do the following just to get a quick idea of how changes I make to a section of code might affect the time it takes to run it:

float startTime = ofGetElapsedTimef();
callSomeFunction()
ofLogNotice("elapsed time: ") << ofGetElapsedTimef() - startTime;
2 Likes

You can port your C++ code to OpenCL kernels. Using the fact that there are many many cores on the GPU. But it must fit into this single instruction multiple data approach.
Reaction diffusion fits well.

“analog Not analog” is based on openFrameworks using OpenCL for acceleration.

OpenCL as used in aNa. Most are from GollyGang / Ready

2 Likes