Bug using stopThread+waitForThread? (occurs in ofTCPServer::close)

hi!

i am using ofTCPServer and a lot of times when calling close() i get a SIGABRT from ofThread::waitForThread,
more specifically from this line: https://github.com/openframeworks/openFrameworks/blob/master/libs/openFrameworks/utils/ofThread.cpp#L105

i think i traced it back to a problem in ofThread:

class HardWorkingThread : public ofThread{
public:
	void threadedFunction(){
		ofSleepMillis(1);
	}
};

for( int i = 0; i < 1000; i++ ){
	cout << "starting " << i << endl;
	HardWorkingThread thread;
	thread.startThread();
	ofSleepMillis(1);
	thread.stopThread(); // tell the thread it should stop doing work
	thread.waitForThread(false); // wait until it's actually done
}

this crashes somewhere deep inside poco/the os with sigabrt.
(alternatively you can create+setup+close a tcp server, yields the same error)


i investigated a bit. it seems there is a race condition in waitForThread. i put comments with brackets to indicate what i think is happening in which order.

void ofThread::run(){
	// (1) jump into user function
	threadedFunction();
    _threadRunning = false;
	// (2) user function finishes for whatever reason (usually stopThread() was called)
	// notice we are still inside the poco thread, so thread.isRunning() will say YES!
   [...cleanup, detach thread, ...] 
	// (4) soon thread.isRunning() will say no. 
}

void ofThread::waitForThread(bool callStopThread, long milliseconds){
    // (3) is the poco thread still running ... yes!
	if(thread.isRunning()){
		threadBeingWaitedFor = true;
		[...do stuff...]
		if (INFINITE_JOIN_TIMEOUT == milliseconds ){
			// (5) joining a dead thread --> SIGABRT
    		thread.join();

notice that if things happen in a slightly different order (e.g. 13245) the bug still happens. a quick fix is to change this line:

// (3) old: check if poco thread is running
if(thread.isRunning()){
// (3) new: check if user function finished running
if(_threadRunning){

there might still be corner cases where this fails. to be supersafe maybe a lock should be used?

_
_


ps. new error: sometimes my testcode generates a malloc error after creating 500+ threads now, but that’s hopefully another story :slight_smile: