I have about 300.000 point I need to rotate every frame.
First I was rotating the points in a for loop but that was pretty slow. I then made a compute shader, but it’s still pretty slow. It’s taking about 8-9ms. It seems that the map function (glMapBuffer) of the ofBufferObject is what slowing it down.
Is there a faster way to transfer data between the host program and GPU, then using the map function?
Is there some other way to rotate the points that would be faster?
(In my case the points is
vector<ofVec3f> that is going to be
vec4 on the GPU because NV only supports even entries, and back to
vec3 when retrieving)
If the rotation is the same for all, maybe use ofRotate()?
the CPU to GPU pass is always the bottleneck, so avoid it if you can.
Why the need to get the points back from the GPU? That’s usually even slower.
Did you try using a VBO, in this case a ofVboMesh? it is pretty fast.
Do you need to rotate each point in a different amount? If such, you can pass to the ofVboMesh extra data on each frame as an array using
I need to get the points back for a process that is depending on the points being aligned in a correct rotation. They are all rotated the same. I am just using a simple mat4 for that in the shader.
Can you tell me why the ofVboMesh is faster then sending the points with a SSBO and would I need a buffer to transfer them back to the host program again?. I am very interested in optimizing processes by use of the compute shader so it would be nice to know.
Is there any idea using a VBO with a bigger data structure then a
vector<ofVec3f> when I only need to transfer points?
I thought you wanted to draw this. So no need to use the VBO but at the same time pulling the data back to the cpu is going to be a slowdown.
I think that the compute shader is the way to go. Maybe OpenCL or CUDA could benefit but you’ll have to test.
Did you check the computeShader example?
Good luck with this!
Are you rotating all the points by the same amount? I was trying to rotate all vertices in a mesh (a point cloud) with
ofVec3f::rotate(), which was pretty slow for 215k points. Looking into that function, there are a bunch of
sin()/cos() calls which I think is the slowest part.
Instead, I used an
rotate() on that, and then multiplied each point by the matrix which was much, much faster since all of the trigonometry calls only happen once. Maybe that could work for your situation?