Compute Shader Multiplcation Operator Appears Not to Work

Hello, I’m working on a compute shader to implement diffusion reaction. (more info here: Reaction-Diffusion Tutorial) I won’t go into detail about the full shader because I’ve created a minimal example of the problem I’m experiencing.

I’ve narrowed down the problem to fact that the * (multiplication) operator does not appear to be working when multiplying two floats. You can see the minimal code below and the results running on my system. If anyone can provide advice that would be much appreciated? Or even just run the code on their systems to see if I can eliminate driver, hardware and/or openframeworks versions as the problem that would be awesome.


Screenshot of my result. If multiplication was working correctly the third texture would be filled with the same pixels just squared. So it would be slightly more filled in.

Note: To test the shader press any button with the window in focus to run the shader once.

bin/data/multi_bug.comp

#version 440

layout(r8, binding = 0) uniform readonly image2D in_img;
layout(r8, binding = 1) uniform writeonly image2D out_img1;
layout(r8, binding = 2) uniform writeonly image2D out_img2;

layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;

void main() {

  float val = imageLoad(in_img,ivec2(gl_GlobalInvocationID.xy)).r / 255.0;

  imageStore(out_img1,ivec2(gl_GlobalInvocationID.xy),vec4(vec3(clamp(val, 0.0, 1.0) * 255.0), 255.0));
  imageStore(out_img2,ivec2(gl_GlobalInvocationID.xy),vec4(vec3(clamp(val * val, 0.0, 1.0) * 255.0), 255.0));
}

src/main.cpp

#include "ofApp.h"
#include "ofMain.h"

int main() {
  ofGLFWWindowSettings settings;
  settings.setGLVersion(4, 4);
  ofCreateWindow(settings);
  ofRunApp(new ofApp());
}

src/ofApp.h

#pragma once

#include "ofMain.h"

class ofApp : public ofBaseApp {
 public:
  int m_width, m_height;

  ofShader m_comp_shader;
  vector<ofTexture> m_comp_textures;

  void setup();
  void update();
  void draw();

  void keyPressed(int key);
};

src/ofApp.cpp

#include "ofApp.h"

void ofApp::setup() {
  m_width = 200;
  m_height = 200;

  m_comp_shader.setupShaderFromFile(GL_COMPUTE_SHADER, "multi_bug.comp");
  m_comp_shader.linkProgram();

  // setup comp_textures
  int num_comp_textures = 3;
  m_comp_textures.reserve(num_comp_textures);
  for (int i = 0; i < num_comp_textures; ++i) {
    m_comp_textures.emplace_back();
  }
  for (size_t i = 0; i < m_comp_textures.size(); ++i) {
    m_comp_textures[i].allocate(m_width, m_height, GL_R8);
  }

  ofPixels tex1_pix;
  m_comp_textures[0].readToPixels(tex1_pix);
  for (size_t i = 0; i < tex1_pix.size(); ++i) {
    tex1_pix.setColor(i, ofColor(ofRandom(255)));
  }
  m_comp_textures[0].loadData(tex1_pix);

  m_comp_textures[0].bindAsImage(0, GL_READ_ONLY);
  m_comp_textures[1].bindAsImage(1, GL_WRITE_ONLY);
  m_comp_textures[2].bindAsImage(2, GL_WRITE_ONLY);
}

void ofApp::update() {}

void ofApp::draw() {
  ofBackground(127);

  for (size_t i = 0; i < m_comp_textures.size(); ++i) {
    auto tex = m_comp_textures[i];
    tex.draw(m_width * i, 0, m_width, m_height);
    ofSetColor(255);
    ofNoFill();
    ofDrawRectangle(m_width * i, 0, m_width, m_height);
  }
}

void ofApp::keyPressed(int key) {
  m_comp_shader.begin();
  m_comp_shader.dispatchCompute(m_width, m_height, 1);
  m_comp_shader.end();
}

System Information:

oF version: of_v0.11.2_vs2017_release
Visual Studio: VS2019 Version 16.10.2
Windows 10: 64bit, Home, 20H2, 19042.1052
GPU: NVIDIA GeForce GTX 1060 with Max-Q Design
NVIDIA driver: 471.11

Further explanation of things I’ve already tried:

I’ve tried a couple all versions of opengl that support compute shaders.

I’ve updated to the latest nvidia drivers.

I’ve also tried other gl image formats.

I know this is the problem because I added a crude multiply function that only uses addition and isn’t exact for floats but it appeared to “fix” the problem. Unfortunately because my function is inaccurate when it comes to floats it isn’t a long term solution. I’ve thought about implementing my own ieee 754 float multiplication function but I’d like to avoid that because I think it will impact performance and this is code that will eventually deployed for a client.

Update seems likely to be a hardware/driver issue because I recreated the program from scratch in cinder with the same result. Either that or I don’t know how to write a compute shader correctly. Will likely try different hardware next. Still open to other advice. If all else fails I might just end up trying to optimize a cpu version.

Hey I’ve been mulling this over for a couple of days. But I’ve very very minimal experience with compute and geometry shaders

Have you run the computeShaderTextureExample in the /gl examples folder? It seems similar in some ways to your simplified project code, particularly with GL_R8 format, r8 layout qualifier, and a shader that multiplies floats. I’ve been wondering if the GL_8R format type (unsigned char) is compatible with the layout qualifier r8 (a floating point type).

Also have you tried using floats for everything: ofFbo format GL_RGB16F (or GL_RGBA16F); ofFloatColor; ofFloatImage; some corresponding layout qualifier, etc.

I wish my glsl was a bit better; I wish I could help more definitively!