simple particles system and thread

Hi,

I’m building a particles system, which is good now for my use, but i’m trying to do it more efficient.

I try to put the update code in a thread (with update i mean changing position and killing particles after a time),
but i got some crashes during drawing, due to (I suppose…) my thread deleting dead particles during the draw procedure is trying to draw them.
I thought about doing deleting procedures out of the thread, but i think that, in this way, the thread will crash trying moving particles deleted by the main loop…

How can be the algorithm to do this and have a threaded particles sytem ?
How can i protect procedures (mutex, lock(), what is it ?) from non-existing elements ? waiting signals ???
Can people who implemented particles system (with deleting system) help me with advices and experience of such system ?

Dudley

making a particle system threaded will not make it more efficient (efficient meaning something like “takes less instructions to compute”). it might make it faster (“takes less time to compute”). but only if you have a multicore processor and you were doing some other hard work on the same core before. for example, if you’re doing heavy computer vision and have a particle system on the same core, it will be faster when you move the particle system to a separate core. it will be fastest if you move it to multiple cores – using multiple threads :wink:

the general solution to the problem you’re having is to use locks. let’s say your particle system has a vector of particles, and is doing draw() in one thread and update in another. something like this:

  
  
class ParticleSystem : ofThread {  
protected:  
	vector<Particle> particles;  
	void threadedFunction() {while(true) {update();}}  
public:  
	void draw() {  
		for(int i = 0; i < particles.size(); i++) {particles[i].draw();}  
	}  
	void update() {  
		for(int i = 0; i < particles.size(); i++) {particles[i].update();}  
		... // remove dead particles  
	}  
}  
  

this is wrong. your compiler won’t give you an error, because it doesn’t have a deep enough understanding of what you’re doing. but you can tell there is a problem just by looking at update() and draw() and seeing that they both use the same data, and at least one is modifying that data. if, at some point, your update() removes some dead particles while draw() is trying to draw them, you’re going to get a crash.

the most basic use of a lock looks like this:

  
  
class ParticleSystem : ofThread {  
protected:  
	vector<Particle> particles;  
	void threadedFunction() {while(true) {update();}}  
public:  
	void draw() {  
		lock();  
		for(int i = 0; i < particles.size(); i++) {particles[i].draw();}  
		unlock();  
	}  
	void update() {  
		lock();  
		for(int i = 0; i < particles[i].size(); i++) {particles[i].update();}  
		... // remove dead particles  
		unlock();  
	}  
}  
  

this will stop the crashing, because now only one of the two threads will have access to the data at any given moment. unfortunately, because the lock()/unlock() surround the entire method, you’re losing most of the benefits of threading.

this is where buffering comes in. i’ll mention double and triple buffering.

double buffering means creating a buffer for the data in each thread, and locking only to copy that data.

  
  
class ParticleSystem : ofThread {  
protected:  
	vector<Particle> particlesDraw, particlesUpdate;  
	void threadedFunction() {while(true) {update();}}  
public:  
	void draw() {  
		lock();  
		for(int i = 0; i < particlesDraw.size(); i++) {particlesDraw[i].draw();}  
		unlock();  
	}  
	void update() {  
		for(int i = 0; i < particlesUpdate.size(); i++) {particlesUpdate.update();}  
		... // remove dead particles from particlesUpdate  
		lock();  
		particlesDraw = particlesUpdate; // copy from one thread to the other  
		unlock();  
	}  
}  
  

because the copy operation should be pretty fast, you don’t have to worry about update() hanging draw(). but maybe draw() takes a long time, and you don’t want it to hang up update(). this is where triple buffering comes in. if double buffering is about creating space for each thread, triple buffering is about introducing space between the threads. sometimes these are called “front”, “middle” and “back”. i’ll keep the same names, and just add “middle”:

  
  
class ParticleSystem : ofThread {  
protected:  
	vector<Particle> particlesDraw, particlesMiddle, particlesUpdate;  
	void threadedFunction() {while(true) {update();}}  
public:  
	void draw() {  
		lock();  
		particlesDraw = particlesMiddle;  
		unlock();  
		for(int i = 0; i < particlesDraw.size(); i++) {particlesDraw[i].draw();}  
	}  
	void update() {  
		for(int i = 0; i < particlesUpdate.size(); i++) {particlesUpdate.update();}  
		... // remove dead particles from particlesUpdate  
		lock();  
		particlesMiddle = particlesUpdate;  
		unlock();  
	}  
}  
  

double buffering already guarantees that both threads never access the same data at the same time. but triple buffering adds another guarantee: the code in between lock()/unlock() is as fast as possible, minimizing the possibility that either thread will hang the other.

One thing I’ll quickly mention is my general apprehension to using stl / vectors in any threaded systems – they are very much not thread safe and why you will need to use mutex (lock / unlock) to keep the threads from stomping on each other even if one is just reading and the other is writing. Because functions like size() can get altered if you add or remove elements, you may have issues with thread safety drawing in one thread and animating in another. I like Kyle’s solution, but my intuition is that you might not even need buffering if you are using a more traditional array of particles as opposed to a vector. You can pre-allocate a large size, and turn them on or off as needed. Might require a bit of work on your part to make adding / deleting elements dynamically in a non dynamically sized array.

zach has a really good point, but i want to clarify a little. his example might seem easier, but it’s only because he has a more advanced understanding of thread safety.

using a static array might look like this:

  
  
class ParticleSystem : ofThread {  
protected:  
	static const int n = 1024;  
	Particle particles[n];  
	void threadedFunction() {while(true) {update();}}  
public:  
	void draw() {  
		for(int i = 0; i < n; i++) {particles_.draw();}  
	}  
	void update() {  
		for(int i = 0; i < n; i++) {particles[i].update();}  
		for(int i = 0; i < n; i++) {particles[i].checkIfDead();}  
	}  
}  
  

technically, this breaks the rule from my previous post: multiple threads are accessing the same data, and at least one is modifying it. so long as you never break this rule, you will never get a crash. but as you acquire a deeper understanding of what’s happening (like zach), there are some ways to bend this rule. i think this is why threading seems so complicated. on the surface, it’s really just that one rule. but as you learn more you realize you can bend the rule in a few different ways.

here’s why you can bend this rule with a static array, but not a vector:

this code:

  
  
void draw() {  
	for(int i = 0; i < n; i++) {particles[i].draw();}  
}  
  

requires two things. it needs to know about the size of the array (n), and it needs to know the location in memory of each particle (particles). if either of these things are changed by update(), we can get a crash. fortunately, we’ve defined the size of the array to be constant, so that’s set. and when you have a static array of particles, they are given a fixed location in memory allocated at runtime.

with a vector, the size might change [i]and_ the memory locations might change. if we use a fixed size vector, then the size definitely won’t change. but stl does not guaranteed that the memory location will stay constant.*

that’s why you can use a static array, but not a vector.

unfortunately, that’s not all. if we take this route of avoiding lock(), things are still a little more complicated. let’s look at Particle and the draw() method:

  
  
class Particle {  
protected:  
	bool dead;  
	ofVec2f position;  
public:  
	void draw() {  
		if(!dead) {  
			ofCircle(position, 4);  
		}  
	}  
	...  
}  
  

draw() needs two things: dead and position. let’s say the only other place we’re using dead looks like this:

  
  
dead = true;  
  

this is what’s called an “atomic” operation. that means that the CPU can’t be interrupted while doing the operation (“atomic” here means “indivisible”). an atomic operation is like surrounding a single operation with lock()/unlock().

but position might not be involved in an atomic operation. for example, maybe in update() we say:

  
  
position.x += velocity.x;  
position.y += velocity.y;  
  

(you should use the overloaded ofVec2f operations :slight_smile: but just for example). this means that the execution order of the code might look like:

  
  
...  
position.x += velocity.x; // update thread  
ofCircle(position, 4); // draw thread  
position.y += velocity.y; // update thread  
...  
  

this will give you weird glitches when things are drawn. it won’t crash, but it won’t look right.

this case isn’t so bad, because the particle is probably moving a small amount. but if you change your code and do something a little more complex inside update(), you will have bigger glitches. for example, let’s say you’re doing something like an average of multiple velocities (weird example, i know):

  
  
for(int i = 0; i < velocities.size(); i++) {position += velocities[i];}  
position /= velocities.size();  
  

in this case your execution order might look like:

  
  
...  
for(int i = 0; i < velocities.size(); i++) {position += velocities[i];} // update thread  
ofCircle(position, 4); // draw thread  
position /= velocities.size(); // update thread  
  

and your particles will be drawn in crazy locations because the division hasn’t happened yet.

in short, zach’s solution can really speed things up. but it also requires a deeper understanding of thread safety, and can create bugs that are hard to track down. i recommend using double or triple buffering for now, because it’s easier and will behave more intuitively.

*in general, if you’re not adding/removing elements, then the vector memory location should stay constant, but it’s not guaranteed. stl calls itself “thread safe, but only if you’re not modifying the data”. to me (and zach) this doesn’t feel very “thread safe” at all :slight_smile: you can read more about stl here http://www.sgi.com/tech/stl/thread-safety.html

this will give you weird glitches when things are drawn. it won’t crash, but it won’t look right.

true true, you bring up a good point / good case for buffering, and a really great / detailed response.

If I was using static arrays, I’d guess I’d lock and unlock around a memcpy to avoid the glitches and have a copy of the data locally within the thread, and the copy of the data for the draw function, and lock around memcpys to a third buffer. I’ve actually seen these kind of glitches way back in the day on a very old, threaded project (were we were rendering on multiple machines, and there was a thread getting the visual data in across the network) and they aren’t pretty :slight_smile: I had no idea what they were at the time, but this thread reminds me of the kinds of read / write problems you can have w. two threads.

one of the reasons I jumped on the STL / Vector thread safety is that I’ve been bit by it crashing many times, and it took me a while to understand how functions like size(), that seem just like “reading” data, would be not thread safe, ie:

http://stackoverflow.com/questions/2377712/on-macosx-using-g-is-stdvector-size-thread-safe

because it’s a bit more opaque how the stl collections work, you have to think harder about how to make them safer in threads.

Wow, thanks for that lesson from basics.

cheers

Just to clarify why exactly you can get crashes when accessing memory from different threads, specially with stl vectors.

So actually accessing memory even for writing from 2 or more threads at the same time doesn’t cause a crash. This is perfectly legal:

  
  
int a[100];  
  
...  
// thread 1  
for(int i=0;i<100;i++){  
   a[i]=i;  
}  
  
//thread 2  
for(int i=0;i<100;i++){  
   cout << a[i] << endl;  
}  
  

The crashes when accessing memory from different threads are segfaults, segmentation faults, which means that you are trying to access a region of memory that the OS has not allocated for your process so the OS kills your program for security reasons (so you cannot read other programs’ memory) by sending it a segfault signal. You can simulate this by writing in a terminal:

  
kill PID -s SIGSEGV  

where PID is the id of the program which can be looked up in the system monitor or with the ps command in a terminal

When you ask for memory (by creating an array, using new or resizing a vector) if you don’t have enough memory left the OS assigns you a segment of a certain size and gives you a pointer to the first memory address in that segment. Íf you try to access a memory address in a segment that is not yours you get a segfault.

The problem with vectors is that they reallocate: if you resize a vector to make it bigger there’s probabilities that all it’s contents are going to be moved to a different segment in memory. If a thread makes a vector grow and another tries to access it at the same time then you can get a segfault if the one which is accessing tries to access it in the old segment when the vector contents are moved to a new one and the OS marks the old segment as not yours.

A safer way to work with vectors is to use them as arrays, by using resize at the beginning and then access them with [] instead of constantly using push_back (and it’s faster too) or using reserve at the beginning and then push_back. that way the memory is not reallocated. Also when a vector shrinks the memory it has reserved is not deleted so you don’t get reallocations either and you get all the benefits of the oo api of the stl vector. Like:

  
  
vector a(100);  
// or to initialize all the memory to 0  
// vector a(100,0);  
  
...  
// thread 1  
for(int i=0;i<a.size();i++){  
   a[i]=i;  
}  
  
//thread 2  
for(int i=0;i<a.size();i++){  
   cout << a[i] << endl;  
}  
  

or:

  
  
vector a;  
a.reserve(100);  
  
...  
// thread 1  
for(int i=0;i<100;i++){  
   a.push_back(100);  
}  
  
//thread 2  
for(int i=0;i<a.size();i++){  
   cout << a[i] << endl;  
}  
  

In all the 3 cases, arrays, vector +`resize, or vector + reserve you can get glitches cause you access memory that hasn’t been assigned a value yet but no crashes.

The main advantage of using vector + resize is that you can use vector.size() instead of some constant, a = b instead of memcpy and it’s as fast and thread safe as an array

3 Likes

wow, that’s great arturo. one of those things i learned as an undergrad but forgot somewhere along the way :slight_smile: that clarifies the meaning of segfault a bunch.

1 Like

such posts like arturo’s tell me a karma system on the forums would sometimes be useful.:slight_smile: great read, great insights, thanks!