Raycaster - fastest way to write pixels

I’m beginner user of Openframeworks and this is my first post here so first I would like to say hello to Everyone!

I’ve been writing a typical CPU based raycasting engine for learning purposes and I noticed that I have really poor performance filling 1920x1080 image buffer with calculated pixel colors.
I’m using ofImage as a buffer to draw to and I’m accessing ofPixels directly using ofPixels.setColor(). Once all pixels are set I’m calling ofImage.update() to update the texture and then ofImage.draw() to put the image on screen.
From what I see all performance is spent on setting calculated pixel colors. If I just skip that line - meaning doing all calculations but skipping writing to ofPixels() performance skyrockets - from 10fps to 250-300fps or so.

Would you have any suggestions on how to make creating the image on screen faster considering the image is generated using CPU in every frame? Should I get GPU involved in the process somehow?
I will be grateful for any pointers so I can dig deeper and find better solution.

Edit: If I set single pixel byte (green component for example) by accessing pixels directly using array syntax performance is a bit better. So setColor() doesn’t seem to be efficient, perhaps it would be faster if I had image buffer as RGBA and wrote each pixel as a 4 byte word?


Hello and welcome to the forum
I think writing directly to the pixels is faster, something like this

pixels[index * 3] = r;
pixels[index * 3 + 1] = g;
pixels[index * 3 + 2] = b;

if you are sure your color object matches the number of channels and order of your ofPixels object, you can use memcpy instead to write a entire pixel at once

memcpy(&pixels[index], &color, 3);

if you don’t need apha use only RGB colors.

Thank you @dimitre !
Writing to pixels directly didn’t seem to make much difference compared to .setColor().
memcpy does look a bit faster but it’s not dramatic difference. Still, a nice improvement.


Hi, it is the usual bottleneck as memory transfer from CPU to GPU tends to be slow, although it shouldn’t be that much.
Can you post the code you are using to do such? some times there are tiny tricks that help.

ironically I think I am doing a similar thing and talking about it in my recent post about a crash when uploading to an ofFbo

so I do the following, think it is pretty quick? You’d need to keep the pixel float data in a separate array ready to upload

    ofFbo showPixels; //allocate elsewhere

    //set up an array of zeroes
    int b1 = width;
    int b2 = height;
    int max = b1 * b2 * 4;  //allowing for RGBA data
    float* data = new float[max];
    memset(data, 0, sizeof(float) * max);
    //replace with data where we have it
    for(int i = 0; i<max; i++) //cycle through and set the pixel RGBA values
    //send the array to the fbo texture at the right channel
    if(showPixels.isAllocated()) showPixels.getTexture(0).loadData(data, b1, b2, GL_RGBA, GL_FLOAT);
    //clear up after
    delete [] data;

Hey Sam, what are you using in setup and in loop here?
Did you allocate in the heap for some reason? Thanks

@roymacdonald The rendering function is a bit of spaghetti now and hard to quote as it’s long.
I’ve just started refactoring it so I can post it later once it’s more readable.

Setting pixels itself is done like this right now:

int index = (y * _resX + x) * 3;
memcpy(&_buffer.getPixels()[index], &pixelColor, 3);


Thank you @Sam_McElhinney_io !
If I understand correctly you are bypassing ofPixels altogether here and just allocating data on heap, then once you’re done you’re loading that data straight to the GPU texture?
So in my case I could load the data to the ofTexture that is bound to the ofImage that I want to put on screen?


That seems quite inefficient as you are still applying one memcpy per pixel.

What @Sam_McElhinney_io is a better approach. You can even still do it with an ofImage and would work fine
the following code works fine without having the fps affected


#pragma once

#include "ofMain.h"

class ofApp : public ofBaseApp{

		void setup();
		void update();
		void draw();
    ofImage img;


#include "ofApp.h"

void ofApp::setup(){
    img.allocate(1920, 1080, OF_IMAGE_COLOR);

void ofApp::update(){


void ofApp::draw(){
    auto p = img.getPixels().getData();
    auto s = img.getPixels().size();
// just use a single value, copied to all pixels, so the calculation of it does not affect performance
    float f = ofMap(ofGetElapsedTimeMillis()%3000, 0, 2999, 0, 255 );
    for(size_t i =0; i< s; ++i){
        p[i] = (unsigned char)f;
    ofDrawBitmapStringHighlight(ofToString(ofGetFrameRate()), 20, 20);

Yeah, I just do a standard ofFbo allocation in setup, and then in the update loop set a timer switch to do the push of data onto the ofFbo every other second. I use fbos because I then use shaders to do other things to the pixels.

You would need some kind of array in the main thread to separately calculate and keep track of all the pixel values, but that could just be a vector of glm::vec4, or whatever. What I actually do with that is have four threads, all calculating pixel values and updating separate parts of that array in parallel; so the calculation doesn’t kill the frame rate either, but it isn’t strictly necessary.


Thank you @roymacdonald ! This looks great, I’ll try this as soon as I have a chance :slight_smile:


1 Like

I would like to add multithreading support at some point too but it would be great to squeeze out more performance out of a single thread first.