Shader loop slower than texture.drawSubsection() loop

What I want to achieve is to alternate parts of the texture with other parts of the texture.
I have three methods, one is a shader that I programmed with the idea to be faster than the other solutions, which is not the case.
Here the shader: ofEmscriptenExamples/random.frag at main · Jonathhhan/ofEmscriptenExamples · GitHub
This is method two: glCopyTexSubImage2D - OpenGL 4 Reference Pages
And this is method three (which is surprisingly the fastet):

	fbo.begin();
	for (int i = 0; i < xPieces * yPieces; i++) {
		fboImg.getTexture().drawSubsection(i % xPieces * puzzlePieceWidth, i / xPieces * puzzlePieceHeight, puzzlePieceWidth, puzzlePieceHeight, data[i] % xPieces * puzzlePieceWidth, data[i] / xPieces * puzzlePieceHeight);
	}
	fbo.end();

I wonder now if I do something wrong with my shader. Maybe the for loop with variable length?
Found this about possible limitations: The Shader Permutation Problem - Part 1: How Did We Get Here?

For comparison: This works with the shader (the shader gets calculated, if you press play):
https://puzzletest2.handmadeproductions.de/
There is some additional stuff, but without it is slower too.
This uses method three, which seems to be the fastest:
https://puzzletest1.handmadeproductions.de/

And a side question: Right now I use a textures alpha channel (made from an .svg file) for masking the puzzle pieces (in this version: https://puzzletest2.handmadeproductions.de/). Would it be possible (and more accurate) to do the masking in the shader / fbo directly with the .svg file?

1 Like

So I found out that the shader version is faster with a good gpu, while the texture.drawSubsection() seems to be faster with a weak gpu.

Hey @Jona I played a couple rounds. Fun! OK so on the shader, yes I feel like both a loop and texture() calls can slow down a shader.

There are probably several ways to eliminate the loop. One of them might be to use a pass-thru copy of texcoord with floor() and fract() to break the normalized texture coordinates into “tiles”:

// copy the pass-thru of texcoord from the vertex shader; range 0.0 - 1.0
vec2 tc = vTexcoord;

// make some tiles, 3 x 3 in this case; range is now 0.0 - 3.0
tc *= vec2(3.0, 3.0);

// floor() to get values of 0.0, 1.0, 2.0; perhaps int() these to index into an array of offset values
vec2 floorTC = floor(tc);

// fract() to get 9 sets of values that again range from 0.0 - 1.0, ready to be offset, scaled, or otherwise used in calculations
vec2 fractTC = fract(tc);

I’ve used this design pattern a lot to remap the incoming values of normalized texture coordinates. I hope you won’t have any GLES limitations with this.

1 Like

Hey @TimChi, thank you. That looks very promising.

Hey sure! I’m curious to see if you notice a difference in the speed. The .drawSubsection() approach is nice in many ways too. It’s simple, clear, and it may suit a wider range of hardware configurations (like those with a modest gpu).

@TimChi I guess I am close to a solution, but there are some issues. The good news is float textures work with Emscripten, and your suggestion is much faster. But there are some duplicate tiles and some strange patterns, it seems there is an error in reading the texture.
Here the shader: ofEmscriptenExamples/random.frag at main · Jonathhhan/ofEmscriptenExamples · GitHub
And here the example (just for showhing the issue, otherwise its without functionality - the small rectangle in the upper left corner represents the data texture): https://puzzletest3.handmadeproductions.de/

What actually makes a difference, if I change this line of the shader:
vec2 offset1 = floor(texture(texture_data, counter).rg * puzzlePieces) * puzzlePieceSize;
To:
vec2 offset1 = floor(texture(texture_data, counter + 0.5 / resolution).rg * puzzlePieces) * puzzlePieceSize;
Which should select the “center” of a pixel.
Now the patter looks correct, still there are some duplicated pieces (every puzzle piece should exist only once).

Hey I just played a round of it from what is currently at this link in the original post:

And there were some duplicate pieces to start with. But clicking on a piece restores it, and the duplicates disappeared after clicking all 8 pieces! So it is pretty close and seems to be working great except for the initial randomization. Plus I saw puzzle shapes this time! The first time I played it, I saw only rectangles. Nice!

One thing I love to do is to use 2 (or more) ofFbos that are all the same size as the OF window. Then I’ll often .draw() an image into one of them and send that into a shader, which is a fast way to crop and resize.

And with .draw() comes its texcoord, which is passed thru the vertex shader and used in the fragment shader instead of gl_FragCoord.xy. And lately I’ve been using normalized texcoord with ofDisableArbTex(). This approach avoids the scaling and remapping of using textures that are different sizes from one another and from the project window.

1 Like

@TimChi Thanks. Clicking the pieces (in version 2) does not use the texture as data, but gets the coordinates from OF as uniform vec2 (its working because of that). But I still guess that I am close. I will try your hints…

I’ve studied it a bit more. Have you tried something other than texture() when sampling texture_data? Maybe texelFetch()?

Some of the 8 pixels in texture_data that contain data have adjacent pixels that are some other color (like black). I wonder if this is causing the repetition, because the value sampled by texture() is really a blended value of a pixel with its neighbors, even when that pixel gets sampled in the “middle” with a slight offset.

One thing you could try is to send (or hard-code) some coordinates (or offsets) in random.frag. This would isolate the sampling of texture_data from other aspects to make sure the rest of the shader is working correctly.

Then you could set the same coordinates in std::vector<> data and compare the sampling of texture_data with the hard-coded values. Comparing could involve something like float value = abs(theirDifference * factor), where a factor magnifies the difference of the two values. And value then becomes out_color = vec4(vec3(value), 1.0), so that you can “see” how much the sampled value differs from what it should be.

If I have a chance this weekend, I’ll write a quick study on this 2nd point. I’ve been curious about using texture() to get a “true value” for quite a while.

1 Like

Hey @TimChi, thank you very much. Yeah, getting the real value would be great (not only for this example). I used your fbo technique, its good to visualize the data, the data texture has the same size as the puzzle and i can use texCordVarying now. And I guess, this fixed the adjacent pixels. But the thing is, there are still duplicate pieces… Maybe I need to add half a puzzle piece size to the data position. I did not update it on Github yet. I will do that and try your ideas tomorrow evening.

Edit: Maybe I need to clamp to border the puzzle texture… ?

Edit2: Now I remember with this shader I get the exact pixel value, but not sure if it is transferable to this example: ofEmscriptenExamples/GameOfLife.frag at main · Jonathhhan/ofEmscriptenExamples · GitHub

My curiosity got the better of me and I did a quick study. On an m1 mac at least, sampling with texture() without a slight offset does give a blended value that can vary depending on the value. But the blending and varying stops with a slight offset, so texture() should be a reliable way to read a value with a small offset.

Different platforms might yield different results. I remember a forum thread from a while back where the shader was having a floating point issue on linux that dimitre could not reproduce on a Mac.

This all seems like an artifact of floating point values. Here is the code for the study:

common.frag:

#version 330

uniform sampler2D tex0;
uniform float posY;
uniform float factor;
in vec2 vTexcoord;
out vec4 fragColor;

void main(){
    vec2 tc = vTexcoord;
    // make 3 regions along y axis
    float floorY = floor(tc.y * 3.0);
    
    // a texel is the (width, height) of 1 pixel
    vec2 texel = vec2(1.0) / vec2(1920.0 / 1080.0);
        
    vec3 color = vec3(0.0);
    if(floorY == 0.0) {
        color = texture(tex0, vec2(tc.x, posY - (texel.y * factor))).rgb;
    } else if(floorY == 1.0) {
        color = texture(tex0, vec2(tc.x, posY)).rgb; // no offset here
    } else {
        color = texture(tex0, vec2(tc.x, posY + (texel.y * factor))).rgb;
    }

    fragColor = vec4(color, 1.0);
}

common.vert:

#version 330
uniform mat4 modelViewProjectionMatrix;
in vec4 position;
in vec2 texcoord;
out vec2 vTexcoord;

void main(){
    gl_Position = modelViewProjectionMatrix * position;
    vTexcoord = texcoord;
}

ofApp.h:

#pragma once
#include "ofMain.h"
#include "ofxGui.h"

class ofApp : public ofBaseApp{
public:
    void setup();
    void update();
    void draw();
    void setImageColors();
    
    ofShader shader;
    ofFloatImage image;
    ofFbo fbo;
    ofParameter<float> posY;
    ofParameter<float> factor;
    float width;
    float height;
    
    ofxPanel panel;
    ofEventListener listener;
};

ofApp.cpp:

#include "ofApp.h"
void ofApp::setup(){
    ofSetFrameRate(60);
    ofToggleFullscreen();
    
    ofDisableArbTex();
    
    width = 1920;
    height = 1080;
    image.allocate(width, height, OF_IMAGE_COLOR);
    fbo.allocate(width, height, GL_RGBA);
        
    shader.load("common.vert", "common.frag");
    
    posY.set("posY", 0.5f, 0.f, 1.f);
    factor.set("factor", 0.01f, 0.f, 0.01f);
    setImageColors();
    
    panel.setup();
    panel.add(posY);
    panel.add(factor);
    
    listener = posY.newListener([this](const ofParameter<float>&){
        setImageColors();
    });
}
//--------------------------------------------------------------
void ofApp::update(){
    fbo.begin();
    shader.begin();
    shader.setUniform1f(posY.getName(), posY.get());
    shader.setUniform1f(factor.getName(), factor.get());
    image.draw(0.f, 0.f);// tex0 and normalized texcoord
    shader.end();
    fbo.end();
}
//--------------------------------------------------------------
void ofApp::draw(){
    fbo.draw(0.f, 0.f);
//    image.draw(0.f, 0.f);
    panel.draw();
}
//--------------------------------------------------------------
void ofApp::setImageColors(){
    for(size_t j{0}; j < image.getHeight(); ++j){
        for(size_t i{0}; i < image.getWidth(); ++i){
            if(j < posY.get() * image.getHeight()) {
                image.setColor(i, j, ofFloatColor(i / image.getWidth()));
            } else {
                image.setColor(i, j, ofFloatColor(0.f));
            }
        }
    }
    image.update();
}
1 Like

It’s worth a try. I think the default is already clamped with GL_CLAMP_TO_EDGE. But I did see somewhere that the image2 texture has a mipmap aspect set to GL_NEAREST. You can set the mode with ofTexture::setTextureWrap().

I got the shader working now, there is some problem with the masking, but my initial problem is solved. The issue was basically, that I had to subtract the position from the offset. Thats the shader (it would look much simpler without the mask):

#version 300 es
// fragment shader

precision lowp float;

in vec2 texCoordVarying;
out vec4 out_Color;

uniform sampler2D texture_image;
uniform sampler2D texture_data;
uniform sampler2D texture_mask;
uniform vec2 resolution;
uniform vec2 puzzlePieces;

void main(){
	vec2 puzzlePieceSize = resolution / puzzlePieces;
	
	vec2 position1 = floor(gl_FragCoord.xy / puzzlePieceSize) / puzzlePieces;
	vec2 position2 = vec2(position1.x - 1. / puzzlePieces.x, position1.y);
	vec2 position3 = vec2(position1.x, position1.y - 1. / puzzlePieces.y);
	vec2 position4 = vec2(position1.x + 1. / puzzlePieces.x, position1.y);
	vec2 position5 = vec2(position1.x, position1.y + 1. / puzzlePieces.y);
	vec2 position6 = vec2(position1.x + 1. / puzzlePieces.x, position1.y + 1. / puzzlePieces.y);

	vec2 offset1 = texture(texture_data, gl_FragCoord.xy / resolution).rg;
	vec2 offset2 = texture(texture_data, fract((gl_FragCoord.xy + vec2(-1., 0.) * puzzlePieceSize) / resolution)).rg;
	vec2 offset3 = texture(texture_data, fract((gl_FragCoord.xy + vec2(0., -1.) * puzzlePieceSize) / resolution)).rg;
	vec2 offset4 = texture(texture_data, fract((gl_FragCoord.xy + vec2(1., 0.) * puzzlePieceSize) / resolution)).rg;
	vec2 offset5 = texture(texture_data, fract((gl_FragCoord.xy + vec2(0., 1.) * puzzlePieceSize) / resolution)).rg;
	vec2 offset6 = texture(texture_data, fract((gl_FragCoord.xy + vec2(1., 1.) * puzzlePieceSize) / resolution)).rg;
	
	vec4 col1 = texture(texture_image, gl_FragCoord.xy / resolution + offset1 - position1);
	vec4 col2 = texture(texture_image, fract(gl_FragCoord.xy / resolution + offset2 - position2));
	vec4 col3 = texture(texture_image, fract(gl_FragCoord.xy / resolution + offset3 - position3));
	vec4 col4 = texture(texture_image, fract(gl_FragCoord.xy / resolution + offset4 - position4));
	vec4 col5 = texture(texture_image, fract(gl_FragCoord.xy / resolution + offset5 - position5));
	vec4 col6 = texture(texture_image, fract(gl_FragCoord.xy / resolution + offset6 - position6));

	vec4 mask = texture(texture_mask, (gl_FragCoord.xy / resolution - position1) * puzzlePieces);
	
	if(mask.r < 1.5 / 255.){
		out_Color = col1;
	}else if(mask.r < 2.5 / 255.){
		out_Color = col2;
	}else if(mask.r < 3.5 / 255.){
		out_Color = col3;
	}else if(mask.r < 4.5 / 255.){
		out_Color = col4;
	}else if(mask.r < 5.5 / 255.){
		out_Color = col5;
	}else if(mask.r < 6.5 / 255.){
		out_Color = col6;
	}
}

Thats the same shader without the mask:

#version 300 es
// fragment shader

precision lowp float;

out vec4 out_Color;

uniform sampler2D texture_image;
uniform sampler2D texture_data;
uniform vec2 resolution;
uniform vec2 puzzlePieces;

void main(){
	vec2 position1 = floor(gl_FragCoord.xy / (resolution / puzzlePieces)) / puzzlePieces;
	vec2 offset1 = texture(texture_data, gl_FragCoord.xy / resolution).rg;
	out_Color = texture(texture_image, gl_FragCoord.xy / resolution + offset1 - position1);
}
1 Like

Hey that’s great that it’s working! Sometimes the math details make all the difference. I find shaders difficult because you can’t just ask them to cout some result at some interim point.

So how do you like the performance of using a shader without a loop? I’ll bet you can reduce the number texture calls with vec4 mask and the else-if statements:

// the values of these will be set by the mask
vec2 position = vec2(0.0);
vec2 offset = vec2(0.0);

if(mask.r < 1.5 / 255.){
	position = position1;
	offset = texture(texture_data, gl_FragCoord.xy / resolution).rg; // offset1
} else if(mask.r < 2.5 / 255.){
	position = vec2(position1.x - 1. / puzzlePieces.x, position1.y); // position2
	offset = texture(texture_data, fract((gl_FragCoord.xy + vec2(-1., 0.) * puzzlePieceSize) / resolution)).rg; // offset2
}else if(mask.r < 3.5 / 255.){
    // just keep using the else-ifs to determine the values of position and offset
}

// then at the end of the else-ifs get the color:
	vec4 color = texture(texture_image, fract(gl_FragCoord.xy / resolution + offset - position));

1 Like

@TimChi thanks for your help. The performance is already much, much better now (100*100 pieces run very smooth on my smartphone, before it was crashing). Will post the whole result, if it is stable (which has nothing to do with this shader).

I adapted the patch for dektop: ofEmscriptenExamples/puzzleTest3_desktop at main · Jonathhhan/ofEmscriptenExamples · GitHub
I only do not own the rights of the puzzle shape (but have the permission for using it), so that needs to be replaced, if someone wants to use it in another context.
And one problem still exists: Somehow some puzzle pieces have an offset of one pixel or so to the right or bottom, that happens also without the mask and can lead to artifacts. Tried a lot to avoid it, to no avail…
Strangely it only works perfect with a 5x5 grid, even the second piece of a 2x1 grid (with a resolution of 800*600) has an unwanted offset, which seems strange, because it should not be a rounding error then? Possibly I get a better result with float textures (but it did not look like that, at least not with GL ES).

I also tried texelFetch, which works, but does not seem to recognize GL_REPEAT, that is why I do not use it.

Maybe a vector mask could improve the result (the mask texture is actually made from an svg. file)?
And I can imagine that somewhere still is a rounding error, that leads to this pre-perfect result…
But the offset issue already appears with this simple shader:

void main() {
	position = floor(gl_FragCoord.xy / puzzlePieceSize) / puzzlePieces;
	offset = texture(texture_data, gl_FragCoord.xy / resolution).rg;
	out_Color = texture(texture_image, gl_FragCoord.xy / resolution + offset - position);
}

Other than that it works really well, if I leave the masking problem away and use puzzle piece sizes that are an integer fraction of the puzzle size…

Hey just out of curiosity did you try using textures that are a power of 2 (so, 1024x1024 or 1024x512)? Or how about changing the texture size to be a whole multiple of the grid size. So for example, a 7x5 would have textures that are 896x640. The 5x5 is pretty strange, but maybe the lack of an offset is a clue to how to fix it.

Also very glad to see this! I was hoping that the performance of the shader would be worth the struggle of using it.

Hey @TimChi, thanks. I just tried power of two textures without any difference. But I found out that 15x1, 15x15 and 1x15 also works perfect.