performance with floating point accumulation images

Hello stackoverflow,

I need to speed up some particle system eye candy I’m working on. The eye candy involves additive blending, accumulation, and trails and glow on the particles. At the moment I’m rendering by hand into a floating point image buffer, converting to unsigned chars at the last minute then uploading to an OpenGL texture. To simulate glow I’m rendering the same texture multiple times at different resolutions and different offsets. This is proving to be too slow, so I’m looking at changing something. The problem is, my dev hardware is an Intel GMA950, but the target machine has an Nvidia GeForce 8800, so it is difficult to profile OpenGL stuff at this stage.

I’m looking at the following options:

* Replace floats with uint32’s in a fixed point 16.16 configuration
* Optimize float operations using SSE2 assembly (image buffer is a 1024*768*3 array of floats)
* Use OpenGL Accumulation Buffer instead of float array
* Use OpenGL floating-point FBO’s instead of float array
* Use OpenGL pixel/vertex shaders

Have you any experience with any of these possibilities? Any thoughts, advice? Something else I haven’t thought of?

definitely last two are going give best performance, especially on 8800 (can process upto 128 pixels in parallel). Effectively you will do the same thing, render to FBO, then render that FBO to a smaller texture using a blur shader (and repeat if need be). There’s loads of examples around of doing simple super fast blurs/glows/bloom etc, but your problem is gonna be developing and testing for 8800 on a gma950! I’m not even sure if that supports shaders :confused: