i’ve been playing with OpenCL lately and loving it. It really is gonna change a lot I think. In true OpenGL style, it usually takes quite a few steps to do some basic things sometimes (the flipside of being so versatile) so I’ve started an ofxOpenCL.

Due to the nature of OpenCL it’s not as plug and play as some other ofxClasses (you still need to know what you’re doing and how it works), but still much simpler than the low-level C API of OpenCL I think (flexibilty v.s simplicity is always a tricky one).

I’ve been testing this on snow leopard, but theoretically it should work on windows as well as long as you have an OpenCL implementation (e.g. recent NVidia drivers for an GF 8800+).

If you wanna try it out, it’s at http://code.google.com/p/ofxmsaof/sourc-…-OpenCL/src

AMD also has a driver for running it on any CPU that supports SSE3.


I havn’t tested though so your mileage may vary.

nice, I will check that out as soon as I have a new mac book pro. What things did you do with it so far? Particle stuff?

Hey, actually you could try installing bootcamp and trying those AMD drivers Dingobloo posted. To get benefits of using GPU over CPU you really do need to process a lot, otherwise the time lost from data transfer over PCI bus outweighs the benefits. I found on my macbook pro you need to do about 100 passes on a 1024x1024 float array before you get performance gain using the GPU! (of course it depends on the algorithm). It does suck that Apple haven’t implemented CPU fallback for older computers, they perfectly could have done. I don’t see why they haven’t - other than the fact that they wanna boost sales of new hardware.

Haven’t written a particle system yet (though looked at some of the samples). Been mainly using it for number crunching / image processing etc. I really wish there was an OpenCL computer vision library, gonna have to roll my own I guess :stuck_out_tongue: (OpenCL and google don’t work very well together, google keeps replacing OpenCL with OpenGL!)

if you put “-opengl” in your google search, google will ignore opengl keywords. anyways, i think openCV plans on using opencl in the future as far as i heard. I think for interactive stuff like games openCL wont make a big difference since in those environments the GPU is stuffed with graphics computations allready.- Anyways for things other than games and things that use the GPU intensively allready openCL will be a great addition! (maybe for physics stuff like havok or bullet in the future though)

Hey Memo, this is awesome! Wanted right away to try and make a particle system with it, but i can’t get your example to compile? Do i have to change any project settings in the normal oF project in xcode? Right now i get a bunch of linker errors (like “_clReleaseCommandQueue”, referenced from: ofxOpenCL::~ofxOpenCL()in ofxOpenCL.o)


oh yea for Snow Leopard you need to add OpenCL.framework to your project, and build for 10.6. I guess on windows you’d need to add whatever libs your opencl implementation provide.

Okay, managed to compile now. But i think you might have not updated MyProgram.cl? You try to load a square kernel, but its not listed in MyProgram.cl… So it cant start the program…


whoops, it’s in the svn now

it just does this

__kernel void square(__global float *a, __global float *result) {  
	int gid = get_global_id(0);  
	result[gid] = a[gid] * a[gid];  

the example is more of a hello world example, doesn’t really do anything useful. Though most hello world examples I"ve seen run a single pass on the data and read back to host. I tried to do something a bit more useful by passing the data between 2 different kernels (in this particular example both operations could have been done in one kernel, but I just wanted to try passing data around while staying on the GPU).

P.S. I’m actually having a lot of fun playing with OpenCL just on Quartz Composer!

I just did a test comparing running the example in CPU mode and compare it to openMP, and i just wondered why openCL doesn’t multithread the computation as openmp does? Any ideas why? GPU mode is still faster then openMP…

openCL actually doesn’t do anything, it is just an API. Its the implementation that decides how everything works. I.e .the drivers specific to your vendor. So what hardware / vendor did you try on? With different OpenCL drivers it may behave quite differently. According to the specs though, if you have a multicore CPU, it should create threads for the job. Depending on the size of the workgroup, CPU may give better performance than GPU since the cost of downloading the data to GPU and then reading back can outweigh the benefits. In my tests like I said I found I had to process huge amounts of data to see any benefits of GPU > CPU!

Im sitting on a macbook pro 2.66 GHz core 2 duo… So i would asume that openCL would multithread the buffer on my cpu! You are not supposed to somewhere tell it to thread?

Hi memo

thanks for this, i’ve been having lots of fun with it.

i just made a few changes in the addon which i think would be nice to have included

in ofxOpenCL
added a writeBuffer method which basically calls clEnqueueWriteBuffer

in ofxOpenCLKernel
changed the addArg(…) method to use a template, so you can pass a generic argument to the kernel.
added a getCLKernel() method to access cl_kernel (kinda like getCVImage() in the opencv addon)

in ofxOpenCLProgram
add a getCLProgram() method to access cl_program

the addon with the changes is in the attachment.




And as a follow up to Rui i can show a picture of a fast benchmark app that i made based on the example code to see the difference on my system.

Note that im able to run it from both my GPU’s! I have modified some code in the addon to do that, and ill try and nice it up, and share the changes.

Rui and i are working these days together here in Copenhagen, and we are experimenting with some opencl stuff. So let’s see if there wont come something out of it :slight_smile:

MacBook Pro 2.66GHz Intel Core 2 Duo
4 GB Ram

GPU 1: GeForce 9400M 256MB
GPU 2: GeForce 9600M GT 512MB

made a test with opencl where 5000 particles interact with every other and mouse cursor.
its running only on the cpu (my graphics card doesnt support opencl) but would love to see it running on a proper gpu hehe

video here: http://vimeo.com/7298380
and source here: http://code.google.com/p/ruisource/


this rocks braaa :slight_smile:

I need a new mac!!! NOW!!!

Wow I just took a look at the src.- Really slim too!

wow nice! can you post the “MyProgram.cl” source too ? am I looking in the wrong place? I guess it would be in the data folder of the particles example but I see only the source.

I’m very curious to see how it’s done

btw, here’s some very fast particle particle code –


and it might be interesting to see if opencl pushes this further ( I had 8 - 10k particles interacting well).

take care,

doh! i forgot to include the opencl program! :slight_smile:


When running pelintras example I get 30fps when running


10fps with


and also 10fps when running


Did I miss anything? Any ideas? I’m on a Nvidia 9400 mini with Snow Leopard and I thought this would be supported.


in that code, i read back from the opencl buffer on every frame which is very heavy when copying data back and forth from the gpu. so that could explain why you get slower framerates when running on gpu mode.

jonas jongejan is working on some particles which get updated to a vbo at startup, and then opencl updates the vbo directly, so you never have to copy stuff back and forth.

that code i posted is just an early experiment, and trying out something new. im sure there are far better ways of doing things. :slight_smile: