OpenCL/Cuda optimisation notes in paper (particles, turbulence)

Hi folks, i am currently on a mission to try and work out how to code this, my math is poor and i am yet to find any implementations other than noticing that one of the authors is now a research engineer at NVidia on their new particle turbulence engine for APEX (or whatever they call it) the car video is excellent to watch and then realize it’s real time.

Any whilst looking through other related papers i found this excellent advice on coding for CUDA/OpenCL starting on page 5 “Implementation” basically discussing what to GPU, what to CPU, and what not to do.

Good advice for sure.