ofxShader: mesh displacement, perlin noise

I’ve been looking into ways to speed up the rendering of dense 3d meshes for playing back realtime-3d-scans. Using displacement maps seems really fast: I’m getting 60+ fps for a full resolution 640x480 map with about half the pixels in use (that’s about 300k triangles).

[attachment=1:n2m6q5vu]displacement-demo.png[/attachment:n2m6q5vu]

The attached code is using two tricks:

1 ofxDisplayList, which caches the mesh structure so it doesn’t have to be recreated each frame

2 ofxShader, which I went through and cleaned up a bit – including minor things like making the .vert/.frag files read from the /data folder instead of adjacent to the binary. More importantly, I added:

  
  
void setSampler2d(string name, ofImage& img, int textureLocation);  
  

Which allows you to set a sampler2d in a shader, which can be useful for things like environment mapping, convolution + kernel based texture processing, displacement maps, etc.

Unfortunately I’ve only got it working in Windows, the sampler2d doesn’t seem to work on OSX. If anyone knows why, I’d be really grateful to hear :slight_smile:

DisplacementRender.zip

Hey Kyle, that pic looks amazing!

right, there’s a LOT of things going on here :slight_smile:

Instead of glTexParameterf etc. you can use the code below. Same thing, just less :slight_smile:

  
  
	ofSetMinMagFilters(GL_NEAREST, GL_NEAREST);  
  

Also you were modifying color in your frag shader, which is a varying and you’re not allowed to do. So your GLSL compiler must have ignored it, but my compiler (gfx drivers) didn’t and gave an error.

Finally, but most importantly, your shader is programmed to take in a GL_TEXTURE_2D (has to be powers of two, and texture coordinates are 0…1). Whereas OF defaults to GL_TEXTURE_RECTANGLE_2D which is an extension that allows non-pow-2 and pixel texture coordinates. On your system you must not have that extension (how old is your system!?) so it goes back to GL_TEXTURE_2D.
So you have a few options:

  • force OF to not use GL_TEXTURE_RECTANGLE_2D. call ofDisableArbTex(); before loading the texture (so it will behave the same on other systems which do have the extension and OF prefers).
  • tell your shader to use sampler2DRect instead of sampler2D and modify your shader to use pixel coordinates instead of normalized (won’t work on yours I think cos you don’t have the extension).
  • call ofEnableNormalizedTexCoords() in setup to set all textures to use normalized coordinates (this is on github but I don’t think is in the distro version). (again wont work on yours cos you don’t have GL_TEXTURE_RECTANGLE_2D).

My preferred option is the latter. Texture coordinates are always 0…1 in all cases, and your texture doesn’t get padded. If you force GL_TEXTURE_2D like in the first option, then your 640x480 texture gets put inside a 1024x512 texture. so textureCoordinate 1,1 is not actually the bottom right of your image, it’s past the bottom right and you get loads of random noise, which can be pretty, but undesirable at times - this is also why you have to reupload the texture. the better way to get past that problem is use the ofTexture::getCoordFromPoint().

P.S. I think you’re using a very old version of ofxShader! It doesn’t have any attribute stuff (unless I added that? can’t remember). Also your ofxShader didn’t give an error on the varying color mod, whereas the one I’m using did (i’ve attached it).

P.P.S. you were drawing the text in red and then not setting the color to white again, so the whole mesh was coming up red.

P.P.P.S. display lists are very oldschool and are being depreciated. in this case the best performance I think will come from using a VBO (or possibly geometry shader). I have an example on how to set them up at
http://memo.tv/vertex-arrays-vbos-and-p-…-eworks-006

modded code below with ofxShader attached (this is not using VBO, just the minor mods mentioned above).

shader.vert:

  
uniform sampler2D displacementMap;  
uniform float depthScaling;  
varying vec4 color;  
  
//const float depthScaling = 255. * 2.5;  
  
void main() {  
	color = texture2D(displacementMap, gl_MultiTexCoord0.xy);  
	vec4 pos = gl_Vertex;  
	pos.z += color.a * depthScaling;  
	gl_Position = gl_ModelViewProjectionMatrix * pos;  
}  
  

shader.frag:

  
  
varying vec4 color;  
  
void main(void) {  
	if(color.a == 0.) discard;  
	gl_FragColor = vec4(color.xyz, 1.0);  
}  
  

testApp.cpp:

  
#include "testApp.h"  
  
void testApp::setup() {  
	ofSetVerticalSync(false);  
	  
	ofDisableArbTex();  
	ofSetMinMagFilters(GL_NEAREST, GL_NEAREST);  
	displacement.loadImage("displacement.png");  
	  
	shader.loadShader("shader");  
	  
	rotX = 0;  
	rotY = 0;  
	  
	glEnable(GL_DEPTH_TEST);  
}  
  
void testApp::update(){  
}  
  
void testApp::draw(){  
	ofBackground(128, 128, 128);  
	  
	ofSetColor(255, 255, 255);  
	  
	glPushMatrix();  
	  
	float w = displacement.getWidth();  
	float h = displacement.getHeight();  
	  
	glTranslatef(ofGetWidth() / 2, ofGetHeight() / 2, 0);  
	rotX = ofLerp(mouseX, rotX, rotSmooth);  
	rotY = ofLerp(mouseY, rotY, rotSmooth);  
	glRotatef(rotX, 0, 1, 0);  
	glRotatef(-rotY, 1, 0, 0);  
	glTranslatef(-w / 2, -h / 2, -320);  
	  
	shader.setShaderActive(true);  
	displacement.getTextureReference().bind();  
	shader.setUniform("displacementMap", 0);  
	shader.setUniform("depthScaling", 300.0f);  
	  
	if(!mesh.draw()) {  
		mesh.begin();  
		int step = 1;  
		glBegin(GL_TRIANGLE_STRIP);  
		for(int y = 0; y < h; y += step) {  
			for(int x = 0; x < w; x += step) {  
				ofPoint texCoords;  
				  
				texCoords = displacement.getTextureReference().getCoordFromPoint(x, y);  
				glTexCoord2f(texCoords.x, texCoords.y);  
				glVertex2f(x, y);  
				  
				texCoords = displacement.getTextureReference().getCoordFromPoint(x, y + step);  
				glTexCoord2f(texCoords.x, texCoords.y);  
				glVertex2f(x, y + step);  
			}  
		}  
		glEnd();  
		mesh.end();  
	}  
	  
	shader.setShaderActive(false);  
	displacement.getTextureReference().unbind();  
	  
	glPopMatrix();  
	  
	ofSetColor(255, 0, 0);  
	ofDrawBitmapString(ofToString(ofGetFrameRate()), 20, 20);  
}  
  

ofxShader.zip

Memo, that’s one of the most useful posts I’ve ever seen.

I took all your comments to heart, along with the version of ofxShader you’re using, and merged them into a new version of ofxShader. As I understand it, this is the most up-to-date ofxShader currently available:

http://code.google.com/p/kyle/source/browse/trunk/openframeworks/addons/ofxShader/src/ofxShader.h

It isn’t completely backwards compatible with older versions, but it has all the features. I tried to design it as cleanly as possible. It uses strings for clarity in the source, except when const char*s are used for speed (e.g., attribute setting).

Here’s a quick list of changes:

  • I noticed glVertexAttrib4s was being called instead of glVertexAttrib4d
  • unload() moved into the destructor so it happens automatically
  • setShaderActive is now begin() and end()
  • syntax for the attribute location stuff is regularized, and I changed an incorrect naming (getUniformLocation -> getAttributeLocation)
  • changed GLints to ints where reasonable to keep people from getting scared
  • removed getActiveVertexAttribute which wasn’t being used
  • changed error messages to ofLog errors
  • fixed the lines that say:
  
  
if(compileStatus > 0) {...}  
else if (compileStatus == 1) {...}  
  

Which didn’t make any sense :wink: I’m working with r61, so I left this out.

  • Very interesting that you can’t modify varyings in the fragment shader, I didn’t know!

  • My system actually does have sampler2DRects, it just requires me to add this line to the vertex shader:

  
  
#extension GL_ARB_texture_rectangle : enable  
  

It makes things so much cleaner… thank you!

  • While display lists are on their way out, it was only five lines to add it in this case (include, object instance, begin, end, draw) while VBOs take me a little longer. I’m also not convinced that, in a case like this, there is a speed difference. Why would there be?

  • Do you have a geometry shader demo? It sounds really interesting.

DisplacementRender.zip

ofxShader.zip

Here’s another demo, generating some 2d fractal noise.

It’s fairly low quality fractal noise, but it’s pretty fast. It works in conjunction with some C++ code that sets up the scaling and dropoff multipliers.

ShaderNoise.zip

uglytv, I feel like those are the kind of errors I get when I’ve set my SDK version incorrectly. Maybe try another version of the SDK?

There other possibility is that you copied testApp.cpp but not testApp.h?

looks nice!

on a mac, I got these errors with the shader:

  
  
OF_ERROR: fragment shader reports:  
ERROR: 0:2: '=' :  assigning non-constant to 'const float'  
ERROR: 0:5: 'array of float' : array type not supported here in glsl < 120   
ERROR: 0:6: 'array of float' : array type not supported here in glsl < 120   
ERROR: 0:36: 'weights' :  left of '[' is not of type array, matrix, or vector    
ERROR: 0:36: 'scaling' :  left of '[' is not of type array, matrix, or vector    
  

in order to get it to compile on the mac, I had to fiddle a bit with the declaration at the top of the fragment code. I changed how pi was defined, and how the arrays were declared.

  
  
const int octaves = 8;  
const float pi = 3.1415926535;  
uniform float seed;  
uniform float weights[octaves] ;  
uniform float scaling[octaves];  
uniform float normalization;  
  

take care,
zach

Ahh, thanks so much Zach. I didn’t test this one on OSX. My bad – I don’t know how the array size got to the lhs (much less how my card managed to understand it).

I figured the asin(1.) was safe, but 3.14… is safer :wink:

Wow, that’s an obscure bug I haven’t run into before.

http://bytes.com/topic/c/answers/633786-why-cant-static-const-float-class-members-inititalized-class

Works fine with my version of gcc via the Code::Blocks projects.

The solution would be to change that line in testApp.h to:

  
  
static float rotSmooth;  
  

(i.e., just get rid of the const)

And in testApp.cpp, add this:

  
  
void testApp::setup() {  
  rotSmooth = .9;  
...  
}  
  

That should do it.

Hey Kyle, I wanted to extend your perlin noise shader to support 3 dimensions, but I’m not sure how I would go about it. I was able to extend the rand() function to produce a vec3, but I’m not sure how to integrate that with the other functions, and I’m not really sure what they are doing anyway. To be honest I don’t really understand any of the math you’re using, it seems like black magic to me. Where did you find this method of noise creation? Any help would be greatly appreciated.

this is the core idea of fractal perlin noise:

  
  
	for(int i = 0; i < octaves; i++)  
		total += weights[i] * bilinearRand((pos + seed * float(i + 1)) * scaling[i]);  
  

you have the bilinearRand function, which describes a smoothly variable field over N dimensions. then you take a bunch of samples at different scaling factors – some low frequency, some high frequency – and do a weighted average.

if you want to extend it from 2 dimensions to 3 dimensions, you just have to write a bilinearRand function that works for vec3 instead of vec2.

here’s the current code:

  
  
float bilinearRand(vec2 pos) {  
	vec2 f = fract(pos);  
	vec2 left = vec2(  
		rand(floor(pos)),  
		rand(floor(pos + vec2(0, 1))));  
	vec2 right = vec2(  
		rand(floor(pos + vec2(1, 0))),  
		rand(floor(pos + vec2(1, 1))));  
	vec2 vert = cosMix(left, right, f.x);  
	return cosMix(vert.x, vert.y, f.y);  
}  
  

imagine ‘pos’ is in a grid. floor(pos) will give you the top left corner of the grid, floor(pos + vec2(0, 1)) will give you the bottom left, floor(pos + vec2(1, 0)) is top right, floor(pos + vec2(1, 1)) is bottom right.

at each of these ‘corner’ positions, we sample a random function with rand().

writing a vec3 rand should be simple, something like this might work:

  
  
float rand(vec3 coord) {  
	return fract(sin(dot(coord.xyz, vec3(12.9898, 78.233, 471.169))) * 43758.5453);  
}  
  

that new number, 471.169, might have to be tweaked to make a better rand(). making a rand() out of a sin() is based on the fact that when you take really large, high-frequency samples from a sine wave it starts to look like noise because of floating point errors.

after you take all the samples, you mix them together. the top left gets mixed with the top right, the bottom left gets mixed with the bottom right:

  
  
vec2 vert = cosMix(left, right, f.x);  
  

the mix amount is based on the fractional x position within the grid. so if you’re more to the left, the left is weighted more. and vice versa for the right.

finally, you mix the top (mixed from the top left, top right) and the bottom (mixed from the bottom left and bottom right) into one value:

  
  
cosMix(vert.x, vert.y, f.y);  
  

this could probably be written in a cleaner way, but i found this kind of mixing was fastest.

if you want to extend the idea to 3d, you need to sample 8 corners instead of 4. then you need to mix them all based on how far the ‘pos’ is to each corner. you could do it once for the front square, then again for the back square, then mix them together – or you could try and find the distance from the pos to the corner directly with length(). i’m not sure which would be faster.

hope that gets you started!

Kyle, Thanks for your post, very informative!

I think this code should work, with the 3rd dimension passed in as a uniform:

  
  
const int octaves = 8;  
float pi = (asin(1.) * 2.0);  
  
uniform float seed;  
uniform float[octaves] weights;  
uniform float[octaves] scaling;  
uniform float normalization;  
uniform float zPos;  
  
float rand(vec3 coord) {  
    return fract(sin(dot(coord.xyz, vec3(12.9898, 78.233, 471.169))) * 43758.5453);  
}  
  
vec4 cosMix(vec4 x, vec4 y, float a) {  
	return mix(x, y, (1. - cos(a * pi)) / 2.);  
}  
  
vec2 cosMix(vec2 x, vec2 y, float a) {  
	return mix(x, y, (1. - cos(a * pi)) / 2.);  
}  
  
float cosMix(float x, float y, float a) {  
	return mix(x, y, (1. - cos(a * pi)) / 2.);  
}  
  
float bilinearRand(vec3 pos) {  
	vec3 fractPos = fract(pos);  
	vec3 floorPos = floor(pos);  
  
	vec4 bot = vec4(  
        rand(floorPos),  
        rand(floorPos + vec3(1,0,0)),  
        rand(floorPos + vec3(1,1,0)),  
        rand(floorPos + vec3(0,1,0)));  
  
    vec4 top = vec4(  
        rand(floorPos + vec3(0,0,1)),  
        rand(floorPos + vec3(1,0,1)),  
        rand(floorPos + vec3(1,1,1)),  
        rand(floorPos + vec3(0,1,1)));  
  
	vec4 xMix = cosMix(bot,top,fractPos.x);  
	vec2 yMix = cosMix(xMix.xy,xMix.zw,fractPos.y);  
  
	return cosMix(yMix.x,yMix.y, fractPos.z); //z mix  
}  
  
float noise(vec3 pos) {  
	float total = 0.;  
	for(int i = 0; i < octaves; i++)  
		total += weights[i] * bilinearRand((pos + seed * float(i + 1)) * scaling[i]);  
	return total * normalization;  
}  
  
void main(void){  
	vec3 pos = vec3(gl_TexCoord[0].xy,zPos);  
	gl_FragColor = vec4(vec3(noise(pos)), 1.);  
}  
  
  

Unfortunately I don’t know if it is working because I keep getting

  
  
r300 FP: Compiler Error:  
r500_fragprog_emit.c::emit_paired(): emit_alu: Too many instructions  
  

from the shader compiler. I’m running on an older macbook pro that has the ati X1600 graphics card which probably has very little memory dedicated for shader programs i’m guessing. Can you think of any way to pare this down more? I did all I could, but it it’s not enough. I even tried removing the cosMix and using the built in mix function directly, but that didn’t help. it must be coming from those two vec4s, i would assume.

As far as I know:

  
uniform float[octaves] weights;    
uniform float[octaves] scaling;   

at least in 1.2 needs to be:

  
uniform float weights[octaves];    
uniform float scaling[octaves];   

This: float pi = (asin(1.) * 2.0) is pretty awesome, never seen it before, but I like it :slight_smile:

Hi joshua, I actually just got it working, the compiler doesn’t seem to care about the way the arrays are declared at all. I really was hitting the hardware’s limit for the shader code-size.

I noticed that if I reduced the “octaves” variable down to 5 it would compile and run, 6+ gives me the too many instructions error. Interestingly enough, the open source gallium3d driver has a GLSL compiler that heavily optimizes for speed and apparently unrolls loops and inlines functions wherever it can. Since “octaves” is a const variable, it can unroll the noise loop and generate a huge amount of code from a tiny shader. I was able to trick it by making octaves a uniform so that it won’t know how many times to unroll the loop and has to keep it the way it is, reducing the generated code’s size. I found out about what the GLSL compiler was doing by finding this when I was looking for ways to hand-optimize the code: http://aras-p.info/blog/2010/09/29/glsl-optimizer/

I also realized that I had some of the coordinates backwards after I could see the output. Here’s the final fragment shader for 3d noise:

  
  
//const int octaves = 5;  
const vec3 rDot = vec3(12.9898, 78.233, 42.43);  
const float rScale = 43758.5453;  
  
uniform int octaves;  
uniform float seed;  
uniform float[8] weights;  
uniform float[8] scaling;  
uniform float normalization;  
uniform float zPos;  
  
  
float pi = (asin(1.) * 2.0);  
vec3 fractPos,floorPos;  
vec4 bot,top,zMix;  
vec2 yMix;  
  
float rand(vec3 coord) {  
    return fract(sin(dot(coord.xyz, rDot)) * rScale);  
}  
  
float bilinearRand(vec3 pos) {  
	fractPos = (1. - cos(fract(pos)*pi))*0.5;  
	floorPos = floor(pos);  
  
	bot = vec4(  
        rand(floorPos),  
        rand(floorPos + vec3(1,0,0)),  
        rand(floorPos + vec3(0,1,0)),  
        rand(floorPos + vec3(1,1,0)));  
  
    top = vec4(  
        rand(floorPos + vec3(0,0,1)),  
        rand(floorPos + vec3(1,0,1)),  
        rand(floorPos + vec3(0,1,1)),  
        rand(floorPos + vec3(1,1,1)));  
  
    zMix = mix(bot,top, fractPos.z);  
    yMix = mix(zMix.xy, zMix.zw, fractPos.y);  
    return mix(yMix.x,yMix.y, fractPos.x);  
}  
  
float noise(vec3 pos) {  
	float total = 0.;  
	for(int i = 0; i < octaves; i++)  
		total += weights[i] * bilinearRand((pos + seed * float(i + 1)) * scaling[i]);  
	return total * normalization;  
}  
  
void main(void){  
	gl_FragColor = vec4(vec3(noise(vec3(gl_TexCoord[0].xy,zPos))), 1.);  
}  
  
  

Wow, really nice link. In retrospect I should have guessed that your problem was something a little more sophisticated :slight_smile:

hmm, my reply disappeared, but you seem to have managed to read it anyway! In regards to “the link which I won’t post again for fear that it was the reason my post was flagged as spam”, It would be really nice to do a comparison of shader performance on Intel/AMD/Nvidia graphics with/without this optimizer. Also, it would be really nice to have this optimizer as a web tool. I may tackle one or both of these at some point, but not now as ironing out that bug was pretty mentally exhausting.

cheers!

wow! that’s so cool you got it working!

i’ve sometimes run into those limitations of code size or array size – it can be very confusing to debug them. my only plan of attack in those cases is the usual ‘simplify everything’ approach. rip out parts to test on their own, and slowly add things in until it doesn’t work. when you can see that everything works by itself, but not together, it’s code size. if you can see the arrays by themselves are breaking things, it’s the array size.

the optimizer link is super interesting. it looks like GLSL optimization works a lot like C/C++ optimization, you just can’t trust the compiler as much.

re spam, if you get flagged it’s just a matter of time before an admin checks the spam filter and finds your message. getting marked doesn’t do anything to your account.

where and how do you set the Zpos ??

thanks