Parallel for-loop

Hi,
I’d like to use parallel for-loops. I have looked at external libraries, from what I understand openMP should be good for this. However, Apple seems to have made it all sorts of difficult to use it in XCode (I am using XCode 12 on Catalina atm, but would like my code to be platform independent between Windows Visual Studio and OSX).

Is there any smart way for-loops can be parallelized from within openframeworks?

Not sure what you mean by parallel for loop. Can you give an example of what you want to execute ? You can always put in a single loop (or while). Also since update/draw are executed at 60hz (or higher), are you sure you need parallel processes in 1/60second execution cycles rather than one after another ?

Sorry. I meant a for-loop in which the execution of what happens inside the for-loop is not handled sequentially one iteration after the other, but in parallel distributed over as many threads as the processor can handle.

I am pretty sure I need parallel processes to speed this up: I am offline rendering (if that’s the right name for it) slitscans for a few hundred videos, workflow (simplified) is this:

  1. extract all stills from an existing video. Using ffmpeg (this seems multithreaded by default);
  2. load all stills to RAM. (this is already as fast as the ssd read speed allows);
  3. using the stills, create a slitscan of the video for each horizontal line and each vertical line of pixels. This creates 1920 + 1080 slitscans (3000 slitscans in total) for each pre-existing video. Dimensions are 1920 * amount of stills (for horizontal slitscanning) and
    amount of stills * 1080 (for vertical slitscanning).
    Amount of stills ranges from 1’500 up to 25’000.

For step 3 (above), I am using for-loops to extract pixel lines from the videostills and for joining those pixel lines into slitscans. Because the iterations of the for-loop don’t rely on each other, I would like to execute as many of them in parallel as possible.

I imagine there is a way to use parallel for-loops. I imagine this would be easier to write and more efficient to execute than writing worker threads and a handler for these working threads.

So, for step 3:

for ( int i = 0; i< 4000; i++)// <- this normally executes sequentially, one iteration after the other.
    {
        //execute function f here.
        //don't do it sequentially as usual. do it in parallel,
        //meaning, as many executions simultaneously as the (multicore) processor can handle
    }

Hi @mativa,

I’ve used openMP with oF before and it worked great! Its been a while though. ofThread is good choice too depending. I can’t help much with the xcode part as I use linux and Qt Creator. But if I recall, I just included omp.h in ofApp.h. It worked well for what you described above (executing a large, computationally heavy loop with several independent threads).

Qt Creator had some issues with compiling it; I never figured out how to get the openMP library into the .qbs file the correct way. But the application compiled and ran fine using the makefile and make. I modified the project config.make file with:

PROJECT_LDFLAGS=-Wl,-rpath=./libs
PROJECT_LDFLAGS += -fopenmp

Edit: I also added it here in config.make:

PROJECT_CFLAGS += -fopenmp

Hi @TimChi
I would love to use openMP but this seems not as straightforward in XCode anymore as of a couple of years ago. This recent post on Stackexchange gives clues on how to use openMP in XCode. It involves using a non-apple compiler that supports openMP. I thought there might be a way to forego this.

With the IDEs and openMP, I think its a matter of getting the linking correct, or maybe configuring the IDE to find or use the openMP library. I don’t think its a compiler issue; GCC compiles it (via terminal and makefile) on linux, and I would think that Clang would too.

You could try using xcode to write/compile/run the basic application without openMP, and then parallelize toward the end and compile it from the terminal. If the loops are in small, separate functions, its easy to go thru and parallelize them when you’re ready to speed thing up.

1 Like

Cool, I like the Idea of just trying a recompile in the end from the terminal. I have the application working but am trying to speed it up.
From what I understand, Apples Clang doesn’t compile it anymore. I have used openframeworks in a basic way since a few years but am completely new to diving into compilers et al, so this is going quite far beyond me actually understanding all the problems mentioned in forum posts around the internet about getting openMP up and running in XCode.

There may be a couple interesting solutions here if you haven’t seen this yet, particularly this one:

"You can also use OpenMP with Apple Clang and Homebrew libomp (brew install libomp). Just replace a command like clang -fopenmp test.c with clang -Xpreprocessor -fopenmp test.c -lomp"

But yeah its fun to take a working oF app, and go thru it to look for parallelization opportunities. Then you get to see how much faster things run on a relative basis. ofThread can help parallelize things too; there is a nice example in /examples/threads/threadExample.

1 Like

Yes, it sounds like you need threads!

A while back I had it working on macos using these instructions ofxFAISS/GETTING_STARTED.md at master · bakercp/ofxFAISS · GitHub. Not sure if that way would still work.

2 Likes

Hi, I made ofxTbb, intel´s theading building blocks, which has the already built library for macos. you just add it with the project generator and you use it straight forwards in xcode (although using TBB is not that straight forwards)

3 Likes

I second @roymacdonald suggestion, tbb is a good option. On mac you can install it following this instruction c++ - What are XCode 8 Environment Variables to run Intel Threading Building Blocks - Stack Overflow. I think the parallel_for method could help you chryswoods.com | Part 2: tbb::parallel_for

Hi @edapx !
With the addon I made there is no need of setting up anything. the project generator does it all.
But right then you need to actually use it. tbb::parallel_for is what I ended up ! that URL you shared is really useful.
cheers

1 Like

Hey Roy, will ofxTbb work on linux too? or just for mac so far?

At the moment it is only Mac because I only compiled it on my mac.
But if you compile the tbb libraries on linux you can add those to the addon. You just need to place the libs in the appropriate folder, libs/tbb/lib/linux and/or libs/tbb/lib/linux64, and update the includes if you are using a different version that the one I included.
Then the other super important part is that you add the necesary stuff in the addons_config.mk file on the linux section. if you compile it as static libs you wont need to do much more than just putting the compliled libs in the aforementioned folders. Take a look at ofxOpenCv’s addon_config.mk file as reference.

Awesome! I will have a go at this, but my knowledge about most of those things is close to awful (external libs, makefiles, etc). So, it will be a learning experience I’m sure. I love the idea of ofxTbb though, because openMP has its issues. And the code looks clean and easy to understand. And there are lots of times in oF when a parallel loop here and there would be super helpful. So, I’ll see if I can make ofxTbb work on linux with something like the threadExample.

Thanks so much for your help; it will be fun to try to get it working.

1 Like

If you get it to work please submit a pull request on the addon’s github.

TBB per se is a c++ library that works on mac/win/linux, you can download it and then using it with smth like:
#include "tbb/parallel_for.h"
You also need to tell Qt the path of the library.

It turns out the TBB library is included with linux Mint 19.x (Ubuntu 18.04). The debian package is libtbb-dev in case its not installed. It ends up in /usr/include/tbb. So, its really easy to use by just including it as @edapx said above, or even with:

#include "tbb/tbb.h"

I also compiled it from source OK, but I struggled with the addon_config.mk file when trying to make an addon out of it like Roy had done. So I’ll fiddle with that a bit more.

Then I wrote a version of the threadExample which updates an ofImage with either TBB, openMP, or the single application thread. In terms of ofGetFrameRate(), TBB and openMP had similar performance, with TBB just slightly better. TBB works great in Qt Creator with the following change in the .qbs file:

of.linkerFlags: ['-ltbb']      // flags passed to the linker
1 Like

Oh, great that Ubuntu has it already there. So what you could do is an empty addon, with no libs or src files, and only the addon_config.mk file, in which you only have the following option active

ADDON_LDFLAGS = -ltbb

This should allow you to add TBB directly from the project generaator withouth having to modify your .qbs file

1 Like