Bottleneck when sending large UBO to shader

Whenever I run the following program I get a bottleneck and the frames per second drops to 30. I’m using a ofxUbo shader that can receive uniform buffer objects.

All it really does is store the position and colors of 2000 particles in a uniform buffer object and sends it to one fragment shader projected over the entire window. The fragment shader then adds the colors of each particle into the frame and renders it.

ofApp.h: (omitting the default functions)

    ofxUboShader shader;

    struct positionData {
        ofVec2f position[2000];
        ofVec3f color[2000];
    } data;

ofApp.cpp:

void ofApp::setup(){
    ofBackground(ofColor::black);

    shader.setupShaderFromFile(GL_FRAGMENT_SHADER, "shader.frag");
    shader.linkProgram();
}

void ofApp::update(){
    for (int i=0; i<2000; ++i) {
        data.position[i].set(ofRandom(0.0, ofGetWindowWidth()), ofRandom(0.0, ofGetWindowHeight()));
        data.color[i].set(1.0, 1.0, 1.0);
    }
}

void ofApp::draw(){
    shader.begin();
        shader.setUniformBuffer("positionData", data);

        ofRect(0.0, 0.0, ofGetWindowWidth(), ofGetWindowHeight());
    shader.end();
}

shader.frag:

out vec4 outputColor;

uniform positionData
{
    vec2 position[2000];
    vec3 color[2000];
};

void main()
{
    vec4 canvas;
    for (int i=0; i<2000; ++i) {
        float amp = 0.1 / distance(position[i].xy, gl_FragCoord.xy);
        canvas += vec4(color[i], 1.0) * amp;
    }

    outputColor = canvas;
}

Is this a bad implementation? I thought it was easily possible to create large particle systems. What am I doing wrong, should I use geometry or compute shaders instead?

Hey, Sorry did not see your post, wrote that add on. (easier to get my attention on github). or you can ping me

First off if you wanna a do particle systems this is not the way to go, you want create a vbo for that, and color and positions are attribute data not uniform data. You can update the positions in the vertex shader for extra speed.

That being said your problem is not the uniform buffers, it is your fragment shader. It is highly inefficient to have large loops like that inside shaders. That is why it is so slow.