Understanding Multi-threading

I’m writing streams of JPEGs to write and read four streams of video and using drawing the direct camera feed as well at a stable 27 fps. I’m trying to apply openCv::blur() to each of the five images separately, but that sinks the FPS to around 20 when three or more images are active.

Below is the code where I apply the blur- the second draw() is for the direct camera feed. Any ideas to improve performance? Is there another, better-performing blur? Or am I just hitting the limits @fresla has warned me about with using a webcam/oF to record and play videos?

void CameraPlayer::applyBlur(ofImage& img)
{
	int scaledBlur = blurRadius * 100;
	ofxCv::blur(img, scaledBlur);
	img.update();
}

void CameraPlayer::draw(ofxTurboJpeg& turbo, ofRectangle& rect)
{
	ofSetColor(255, 255, 255, amplitude * 255);
	ofImage img; 
	string file = std::to_string(currentFrame) + ".jpg";
	turbo.load(img, file);
	if (!img.isAllocated())
	{
		return;
	}
	if (blurRadius > 0.02f)
	{
		applyBlur(img);
	}
	img.draw(rect);
}

void CameraPlayer::draw(ofVideoGrabber& cam, ofRectangle& rect)
{
	ofSetColor(255, 255, 255, amplitude * 255);
	if (blurRadius <= 0.02f)
	{
		cam.draw(rect);
		return;
	}
	ofImage c = cam.getPixels();
	applyBlur(c);
	c.draw(rect);
}

Faster approach would be to use a shader to apply the blur.

Take a look at the examples/shaders/09_gaussianBlurExample/ and modify it with:

//--------------------------------------------------------------
void ofApp::setup(){
	
	if(ofIsGLProgrammableRenderer()){
		shaderBlurX.load("shadersGL3/shaderBlurX");
		shaderBlurY.load("shadersGL3/shaderBlurY");
	}else{
		shaderBlurX.load("shadersGL2/shaderBlurX");
		shaderBlurY.load("shadersGL2/shaderBlurY");
	}

	image.load("img.jpg");
	
	fboBlurOnePass.allocate(image.getWidth(), image.getHeight());
	fboBlurTwoPass.allocate(image.getWidth(), image.getHeight());
}

//--------------------------------------------------------------
void ofApp::update(){

}

//--------------------------------------------------------------
void ofApp::draw(){
	
	float blur = ofMap(mouseX, 0, ofGetWidth(), 0, 15, true);
	
    fboBlurTwoPass.begin();
        image.draw(0,0);
    fboBlurTwoPass.end();
    
    int numPasses = 3;
    for(int d = 0; d < numPasses; d++){
        
        float blurPerPass = ofMap(d, 0, numPasses, blur, blur / numPasses);
    
        //----------------------------------------------------------
        fboBlurOnePass.begin();
        
        shaderBlurX.begin();
        shaderBlurX.setUniform1f("blurAmnt", blurPerPass);

        fboBlurTwoPass.draw(0, 0);
        
        shaderBlurX.end();
        
        fboBlurOnePass.end();
        
        //----------------------------------------------------------
        fboBlurTwoPass.begin();
        
        shaderBlurY.begin();
        shaderBlurY.setUniform1f("blurAmnt", blurPerPass);
        
        fboBlurOnePass.draw(0, 0);
        
        shaderBlurY.end();
        
        fboBlurTwoPass.end();
    }
    
    //----------------------------------------------------------
    ofSetColor(ofColor::white);
    fboBlurTwoPass.draw(0, 0);

}

Should give you a nice smooth image.

Thank you! That seems to help a bit. Now 4 images at once is mostly stable. 5 drops to 20fps sometimes, but not as much as before.

@s_e_p Just to clarify, my suggestions were in general about expecting pro results from non-pro equipment. Yes your webcam requires the image be decoded from H264 to pixel data - this uses CPU cycles. Webcams can be great in the right context with the right expectations, and may well be for your project too.

The openCV blur also uses CPU cycles. The loading of images and likely other tasks your app is doing are also using up CPU cycles. Altogether this seems to add up to more than your CPU can achieve in the time of a single frame so your app slows down. You can look at optimising things further. The GPU blur is a great start - and if you have any other effects or manipulation of images try make sure they are done on the GPU.

You can also look at threading some functions so they run in parallel to the main thread. This may take some work and is not always easy. There are some good examples int he OF examples folder.

Of course sometimes you may just have reached the level of attainable optimisation for the project or at least for your skills and time budget, so another option that sometimes works and is sometimes cheaper is throwing more power at the project by using machines with faster CPU’s/more cores, or more of whatever resource may help push you over the line.

Hey, thanks for your reply. I think, even with the turboJpeg addon, that the four jpeg sequences + camera drawing are already putting a strain on the CPU. All I’m doing other than that are alpha blending and the blur.

Edit: I think I know what’s happeningregarding the problem below. The changes to ofSetColor() in one CameraPlayer are carrying over o the shaderpass of the next.

@theo I’m having a weird bug. I have two instances of a class CameraPlayer() that contains the two ofFbo and ofShader from your blur example. In the class, I turn the image on and off by setting the alpha of the image with ofSetColor() and then do alpha blending in ofApp.cpp with ofEnableBlendMode(OF_BLENDMODE_ADD).

For some reason, if I have image A on, then turn image B on, and then fade out image B, image A gets faded out as well, even though printing the values to the console shows that the alpha value sent to ofSetColor() should still be high. This also only happens when doing fboBlurTwoPass.draw(), not with image.draw(), so it seems to have something to do with the shader or FBO. But since the shaders and fbos are declared within the class, I would think that they wouldn’t interfere with those of other instances. I can make a compact example script to share tomorrow, but just thought I’d ask first in case your or anyone else has an idea.

(Nb. I’m only wrapping the draw() code in the alpha blend- not the blur code. Having the blur code in there too made the image get super white very fast- apparently the multiple passes were each getting alpha-added to each other)

It looks like the ofxTruboJpeg addon is single threaded, creating threads for your saving or loading or both would likley give you more headroom.

Hey, I had my performance recently and just ran it in 20fps, which worked well enough. Looking to optimize it now though and looking through the projects in openFrameworks/examples/threads.

Looking at the “threadChannelExample”, I see that data (ofPixels) is sent in one place with ofThreadChannel::send(), and then received in another with ofThreadChannel::receive(). This receive is in a function called ImgAnalysisThread::threadedFunction(), which seems to not be referenced anywhere except its declaration in ImgAnalysisThread.h and its definition in ImgAnalysisThread.cpp. @arturo 's put a lot of comments in here, but I still don’t get this…

If I’m going to to try and apply this technique to modify @armadillu 's ofxTurboJpeg to be multithreaded, I need to understand where to put something like that “threadedFunction” and when to call it. Any help is appreciated.

EDIT: Was just watching this video on multi-threading in c++ and tried to do the same thing in oF

void ofApp::setup(){
	std::thread worker1(function1);
	std::thread worker2(function2);
}

void ofApp::function1()
{
	for (int i = 0; i < 200; ++i)
	{
		std::cout << "+";
	}
}

void ofApp::function2()
{
	for (int i = 0; i < 200; ++i)
	{

		std::cout << "-";
	}
}

but VS throws an error at the arguments of worker1 and worker2:

no instance of constructor "std::thread::thread" matches the argument list
       argument types are: (void ())

Does std::thread not work in oF?

Hey @s_e_p also have a look a the threadExample too. In both examples, the parent class is ofThread, which has the ofThread::threadedFunction() along with a std::mutex for locking and some other stuff to manage the thread. The threadedFunction() has a while loop which runs as long as the thread is active and isThreadRunning() returns true.

I like using ofThread when I need an autonomous, independent thread to do something; its sort of like running a mini-program inside the OF app. But there are other ways to use threads depending on what you’re doing. There are a couple of libraries (TBB, or GCD on macOS) that help with task-based threading for specific tasks, like crunching ofPixels data in a parallel-for loop. The task-based approach abstracts a lot of the details related to thread management, and lets the code focus on the task(s). Using these libs incurs some overhead, but they can be a big help with computationally heavy tasks.

There are several threads in the forum about parallel computing and OF. And remember too that only one thread (the OF app) can make openGL calls.

Also with regard to this, std::thread should work in OF. But you might need to pass a lambda or a std::function type as an argument instead of a member of ofApp.

Edit: I got this to work for std::thread in OF:

// add a struct in ofApp.h
struct Fn1{
    void print(){
        for(size_t i{0}; i < 200; ++i){
            std::cout << i << std::endl;
        }
    }
};

// to class ofApp, make an instance of struct Fn1 and add fn2():
    void fn2();
    Fn1 fn1;

// in ofApp::setup(), some different ways to thread:
    // use the struct
    std::thread thread1(&Fn1::print, &fn1);
    thread1.join();
    
    // use a lambda
    std::thread thread2([this](){
        for(size_t i{0}; i < 200; ++i){
            std::cout << i << std::endl;
        }
    });
    thread2.join();
    
    // call a fn in ofApp with a lambda
    std::thread thread3([this](){
        fn2();
    });
    thread3.join();

I looked at this reference and then tried the above. Using the lambda to call an function if ofApp seems the most clear to me, and there are probably still more ways to construct std::thread inside ofApp.

One thing that is good to understand about threading is that if you main app / thread is dependent on anything happening in the other threads you probably won’t get the performance you are hoping for.

eg: if the main thread is saving out frames continuously and you are trying to speed it up by threading it, it probably won’t help.

However if you need to save an image periodically and the saving / writing to disk is slow and you don’t want it to slow down the main thread, then a thread for the saving of the file makes sense.

My suggestion whenever people are trying to speed up an app is to first understand where the slow down is happening, either by using a profiler or by timing different parts of the app until you narrow it down. Then once you understand where the bottleneck is, look at solutions for speeding that part up.

1 Like

@TimChi Hey, I tried your code, but they all print out consecutively instead of with overlap. Even if I use two threads in the same style. In the video I linked to, multi-threading causes the order of the cout printing to overlap…

@theo Ah, that’s too bad. I’m saving, loading and drawing every frame, so I guess multi-threading won’t help…

If you invoke all three of them together, and then .join() them together, you should get output that is not always consecutive. Printing symbols might help to make it more noticeable.

One thing you can try is to set up queues of data to be loaded or saved, so that each thread has its pool of stuff in its queue (or queues) to work with. But I’m not sure if multiple threads can simultaneously access the filesystem or not.

Containers don’t work well with threads, but c-style arrays seem OK as they have a fixed size and location in memory. Some methods of copying between them are faster than others. The TBB library (or even the STL) may have some thread-safe containers now but you’d have to look and see if they exist and if they have decent performance.

You might be able to do something like this, which was for a raspberry pi. I’m not sure how robust or safe it ended up being, but I think it worked OKish and it could definitely be improved. And you could also adapt the ofxThreadedImageLoader class, maybe create an ofxThreadedImageSaver and add some autonomous queues for data to both classes.

What do you think about using add-ons helpers to help on threading?

1 Like