Parallel computing

As update() methods are quite common in OF apps, I’m guessing having several objects without interdependency updated at the same time would be a good idea.

I was wondering what people did in OF to parallelize for loops, there’s openMP and I found this pretty old thread. Maybe now people use Poco to do this? I’m kind of a newb on the matter. I’m looking for an introduction to parallel computing, I know ofThread but I’d like to see more of it.

Cheers!

The biggest gotcha with multi-threading in OF would be that you can’t do anything requiring access to the graphics card (OpenGL calls) outside of the main thread. Many built in classes which contain a texture reference can be configured to not use their texture, so that only CPU based processing will occur during an update.

I’ve used both OpenMP and ofThread extensively and it really depends on your use case as to which one makes more sense. Generally, tasks can fall into two categories: temporary parallelization of a large but similar workload, or continuous processes which don’t rely on the main process much; a producer/consumer workflow fits this perfectly.

For the former group, and in the case of calling update on multiple unrelated items which don’t otherwise require a thread, OpenMP is the better choice. It’s very simple to create tasks which run on separate threads and can even share memory. You can also use OpenMP to speed up CPU based pixel operations, large particle interactions, or anything else that is easily parallelizable (many objects that do the same thing). It should be noted however that in some cases running these types of workloads on the GPU in a shader or OpenCL can be just as fast or faster than the CPU only OpenMP approach.

The latter group, where you have a process doing something repetitive which isn’t necessarily tied to the rendering pipeline, can definitely be done with ofThread or some other threading library. As I mentioned earlier, a producer/consumer workflow is a great candidate for a separate thread or combination of threads. I’ve used this model in ofxVideoRecorder where I need to output frames of video and audio as fast as possible, so I have worker threads to wait until a frame is ready and ship it off to be encoded. I’ve also used a separate thread when grabbing video from a webcam to store them in a buffer, so that I don’t miss any frames between the main program’s update() calls.

If you have any specific questions regarding either OpenMP or using ofThread, feel free to ask. Good luck!

1 Like

Thanks a ton Tim! So for your first paragraph. If a class has access to GL operations it will give me trouble parallelizing? Even if I don’t call such functions on update? Shouldn’t it be enough joining all threads at the end of the update?

I usually end up with lots of unrelated items, these are the ones I’d like to optimize using OpenMP. I often end up with things like:

void update(){
    for ( int i = 0; i < n_objects; ++i){
        my_vector_of_objects[i].update();
    }
}

I will check how OpenMP can be used for such cases.

Best!

If your update method doesn’t do anything that touches the GPU, then you can definitely use OpenMP to parallelize that type of loop. If it does make any OpenGL calls, consider splitting those out into the draw method for your objects, which would be run in an unparallelized loop inside of the ofApp’s draw method.

It should be as simple as including <omp.h> and linknig the library, then adding the compiler directive:

void update(){
    #pragma omp parallel for num_threads(4)
    for ( int i = 0; i < n_objects; ++i){
        my_vector_of_objects[i].update();
    }
}

where you can replace num_threads(4) with however many threads you want it to split into. There are other ways of setting the number of threads to use, as well as querying the max number of threads your system supports in hardware, just search for the OpenMP docs and you should find some examples

1 Like

Hello

For a long while I’ve been using exactly the OMP process above to parallelise a series of simple but repetitive tasks in a for loop. It has proven super valuable, to the extent that (being on a mac) I’ve ended up holding on to an old version of Xcode (4.5) in order to preserve comp support as Apple have kindly and gradually killed off the friendly GCC openMP supporting compilers in favour of LLVM.

With Yosemite coming out I can’t even do that as I have been forced to move to Xcode 6.1 and Apple LLVM 6.0. This has left me completely high and dry as a series of apps that I’ve spent a long long time on have this kind of openmp trick at the heart of making them run smoothly. They now essentially don’t run well enough to be meaningfully useful.

Apparently there are openmp supporting LLVM compilers available, but implementing these as Xcode compilers has so far proven beyond me. I also don’t want to roll back to an old OS X.

Can anyone suggest a way forward or a work around? Is there either any way to get openMP reliably working again within Xcode 6.1 or is there a simple alternative to parallelising for loops?

In hope
Sam

Well,

Here I am answering my own question. It has taken a while and I thought (given the months++ this has had me stumped for) others who pass this way in future might appreciate (or add to) the solution.

  • If you are on a mac with Xcode 6.1 or later, you won’t be able to use OMP (or, good luck trying, I’ve failed at length)

  • Apple have an alternative which is Grand Central Dispatch. It is a little more involved, code-wise.

  • To simplify this, in the zip file below, I’ve wrapped up two approaches to the GCD as functions that can be called in a single line. These are essentially two parallel_for operations, each taking just one line.

  • Both operations are held within a ‘parallelise’ class object

  • The first operation, ‘for_all’, passes the parameters of a function, a for loop start and a for loop end

  • The second operation, ‘for_batch’, passes the parameters of a function, a for loop start/end and a batch size which determines the degree of concurrency (i.e. allows optimisation of tasks relative to startup overhead costs)

So the call is:

parallelise.for_all(&ofApp::function, 10,20)

or

parallelise.for_batch(&ofApp::function, 30,40,2);

where the function is:

void ofApp::function(int a){
    cout<<(a); //do stuff here
}

Hope that helps others; here’s the file: concurrency.zip (72.1 KB)

Sam

2 Likes