anyone here using openMP with OF?
i tried it today and didn’t get very far. the main reason i guess were some incompatibilities with normal threading.
i’d be interested in other experiences because i’d like to switch from pthreads to openMP if that’s worth the effort.
It’s more for parallelizing loops, etc (rather than managing individual threads), but as far a I understand that is what OpenMP is mostly used for as well.
today i was successful in using openMP. i just converted my multithreaded app to a singlethreaded one. then parallelized the for loops that do most of the processing. and it worked. much better than the multithreaded one. it distributes the load equally over the 8 cores of the mac pro.
the app processes the feeds of three webcams and does some heavy real time image manipulation with openCV. surprisingly it’s much faster to process the data of the three cameras sequentially and have just some parallel loops.
and once i got openMP to work it was just half an hour. just adding some
#pragma omp parallel for
here and there and checking the performance improvement. i had to remove one or two #pragmas because they slowed down the program. apparently more overhead than performance gain.
to use openMP (on osx) you need XCode 3.1 and change to gcc version 4.2 (in the project settings), tick the use openMP checkbox and add some #pragmas. that’s it.
wow thats awesome! so openmp parallelizes for loops? how does that work? do you write a kernel routine and send that to openmp or something instead of a for loop? do you need mutex’s or anything?
no, it’s much simpler. you just add a #pragma statement and as long as your loops stick to some simple rules (no data dependencies between loop steps for example) the get parallelized without any additional code. like in this example:
#define N 100000
int main(int argc, char *argv[])
{
int i, a[N];
#pragma omp parallel for
for (i=0;i<N;i++) a[i]= 2*i;
return 0;
}
the
#pragma omp parallel for
is the openMP statement.
it’s really easy and fun.
there’s some additional support for mutexes and other stuff but i didn’t need that.
wow thats amazing, does the pragma apply to the very next line? I’m wondering whether its possible to have it apply to the for loops in an external library, like opencv…
I am also really interested to get my for loops parrallelized. I did what you said but when I change the GCC version to 4.2 it throws some errors:
8 of these: /Developer/SDKs/MacOSX10.4u.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory
if I only toggle use openMP and stick to GCC version 4.0 I don’t get the errors but the speed neither seems to improve.-
Any ideas?
Thanks!
EDIT:
Okay, I read a little bit about all that stuff. As far as I can tell you need GCC 4.2 to make use of OpenMP.
GCC 4.2 only works with the Base SDK for Mac OSX 10.5. If I select that with my OF project it throws a whole bunch of errors since OF does not really like that SDK as far as I can tell.
If I make an emtpy cocoa project everything works fine.
So how did you manage to do that? If it is too difficult would you mind uploading an empty xcode project using of and openMP?
What an incredible change in Performance. Even though I am on a first generation MBP (I think they have 2 cores, not sure though) I get almost big speed improvement for this simple loop:
//#pragma omp parallel for
for(int i=0; i<1000000; i++){
}
approx. 410 fps with using openMP
and 270 without it.
Wow. That’s super impressive. So, excuse my naivety on these matters, but OpenMP is just being used to parallelize for loops that are doing the image processing for 3 simultaneous camera feeds and there’s no shared memory problems with that? Did you have to do anything special to get the camera data into a parallelizable state that’s markedly different than what you would do for a multithreaded app?
the fun thing is: i don’t need to rewrite the program much other than adding the pragmas and take care so that the for loops don’t have data dependencies in themselves. all the rest is sequential. so the program processes the data of the three cameras after each other. only the processing loops are parallel.
i think it really depends on what kind of loops you parallelize. i guess image processing works well because most of it is just all the same for each pixel.
however openMP creates only one worker thread for each core, so if you only have one or two cores the improvement might be much less than with the mighty 8-core mac pro.
This is interesting. Does anyone have an example on how to turn on openMP support for opencv? Does it require editing anything or it’s already written into openCv?
I am also trying to use OpenMP for parallelizing my huge “image processing” loops. I’m on Windows and using CB with Mingw 4.7.1, since those have the additional libraries needed to run oF. But it doesn’t have OpenMP support (no lib files).
If I upgrade to Mingw 4.8.1, I can run OpenMP’s sample program nicely, but the oF lib addons that came for Mingw 4.7.1 fail to run because of version mismatch.
Any help is appreciated.
P.S: I tried looking at Intel TBB, but it is too confusing to use.