I need to speed up some particle system eye candy I’m working on. The eye candy involves additive blending, accumulation, and trails and glow on the particles. At the moment I’m rendering by hand into a floating point image buffer, converting to unsigned chars at the last minute then uploading to an OpenGL texture. To simulate glow I’m rendering the same texture multiple times at different resolutions and different offsets. This is proving to be too slow, so I’m looking at changing something. The problem is, my dev hardware is an Intel GMA950, but the target machine has an Nvidia GeForce 8800, so it is difficult to profile OpenGL stuff at this stage.
I’m looking at the following options:
* Replace floats with uint32’s in a fixed point 16.16 configuration
* Optimize float operations using SSE2 assembly (image buffer is a 1024*768*3 array of floats)
* Use OpenGL Accumulation Buffer instead of float array
* Use OpenGL floating-point FBO’s instead of float array
* Use OpenGL pixel/vertex shaders
Have you any experience with any of these possibilities? Any thoughts, advice? Something else I haven’t thought of?