FASTEST way of reading FBO pixels or screen pixels, please read!

Hello everyone, just like the title says Im trying to find what is the fastest way of getting access to a FBO’s pixels or a part of the screen, the reason I want to do so is because Im trying to optimise a part of a finger/blob tracking application for optical multitouch that I’m developing, the key features of the app are that its very fast (or at least thats what I’m aiming for) and that you’ll be able to place your webcam facing the screen or table at an angle. Now, in order for it to be able to work properly it must correct the input image’s perspective, this is where the bottleneck of the app is, once the perspective has been adjusted I now need to run the blob finding algorithm on this new perspective corrected image, so far I’ve tried 2 things: draw this image to the screen and then use a texture to grab the part of the screen where the image was drawn, I’m doing this with:

  
  
//              ofPixels pix and unsigned char *pixels_pointer were previously defined  
//              at this point in the program, the original image has been drawn on the left and the new image on the right  
                textura.loadScreenData(video_width,0,video_width,video_height);  
                textura.readToPixels(pix);  
                pixels_pointer = pix.getPixels();  

Just this 3 lines take about 12-15 milliseconds on average to complete while the rest of the program (both what is behind and after these lines) takes 1 millisecond on average (on a 320x240 pixels input image). The other thing I’ve tried is instead of drawing the new image on screen draw it to a FBO then access the FBO pixeles with:

  
	fbo.readToPixels(pix);  
	pixels_pointer = pix.getPixels();  

and once again, just those 2 lines take about the same time as when using a texture. My question is why is it so slow? I read somewhere in the forum that this happens when copying between GPU memory and system memory, but my computer has an integrated gpu which means there’s no dedicated video memory(well, there is, but its shared with normal memory), so it shouldn’t be that slow right? Is there any other way of doing this thats faster? I remember there was a ofxFBO addon that had a getPixels() function, did anyone ever get to try it? was it fast? I’ve ran out of options people, thats why Im asking so please openframeworks community help!

reading from the gpu memory is really slow, you can read asynchronously by using a pbo but that will only make your application to not block while reading the texture but the times would be the same. you can do perspective correction in the cpu using ofxOpenCv which is slow but for a small image should be faster than reading from the gpu memory.

another solution that is faster is to do the blob detection on the distorted image and then apply the perspective correction on the cpu to the centers of the blobs only instead of the whole image.

you can use for example the methods described here:

http://forum.openframeworks.cc/t/quad-warping–homography-without-opencv/3121/0

to get a 4x4 matrix and then multiply the points of the blob centroids by that matrix

also even if your gpu has no dedicated memory, the problem is that in order for reading the contents of the texture the program has to block till everything previous openGL call has finished. usually when you draw things or send texture data using openGL that doesn’t happen immediately, instead openGL accumulates all those instructions and executes them whenever it can and in the order that makes things faster. when you try to read from the gpu, it has to finish all the previous calls before being able to read making any instruction that reads from the gpu really slow

thanks a lot for the reply arturo, I’ve been reading a bit about pixel buffer objects and yeah you’re right, the time it takes to read isn’t really going to change much, what changes is that your program can do something else while the reading takes place, so unless the app has something else to do it wont be able to take advantage of it which kinda sucks because on my app there’s really nothing else to be done while waiting for the pixels pointer although doing the perspective correction on the blobs centroids is actually a great idea, will try that next, btw I think the reason why the readToPixels() function is slow is completely dependant on the hardware, for instance, the first results I got were with my old and trusty 2009 white macbook, its specs are: Processor 2.13 GHz Intel Core 2 Duo, Memory 4 GB 667 MHz DDR2 SDRAM and Graphics NVIDIA GeForce 9400M 256 MB. This machine does not have discrete graphics memory, those 256 MB are shared with the main memory, I was kinda wrong when I said this machine had an iGPU, both GPU and CPU are independent chips so, in reality it does have a discrete GPU but lacks the usual much faster GDDR discrete memory found on discrete graphics cards, does that makes sense?.. now, I tested the same program on my ubuntu box, that machine has an intel celeron CPU and I believe it has intel’s HD Graphics 2000 GPU which afaik resides on the same die as the CPU, it also has DDR3 memory, the same program gave me 2 ms for the FBO and a bit higher but still lower values for Texture and Image screen reads when compared to my macbook, I don’t know if the reason was the faster DDR3 ram or the CPU with integrated graphics (this one really has an iGPU) or a combination of both. In case anyone is interested in testing this I’ll leave the test program I used

main.cpp:

  
#include "ofMain.h"  
#include "benchmarcador.h"  
  
#define width 320  
#define height 240  
  
class testApp : public ofBaseApp{  
      
public:  
      
    ofVideoPlayer movie;  
    benchmarker bfbo, bimage, btexture;  
    ofTexture text;  
    ofImage pic;  
    ofFbo f;  
    ofPixels pix;  
    string s;  
    unsigned char * pixels_pointer;  
    int caso;  
    unsigned long long tiempo;  
      
    void setup() {  
        caso = 1;  
        text.allocate(width, height, GL_RGB);  
        f.allocate(width, height, GL_RGB);  
        pic.allocate(width, height, OF_IMAGE_COLOR);  
        movie.loadMovie("your_movie.mov");  //  change the #define statements at the beginning to match your movie's dimensions  
        movie.play();  
    }  
    void update() {  
        movie.update();  
    }  
    void draw() {  
          
        switch (caso) {  
            case 1:  
                f.begin();  
                movie.draw(0,0);  
                f.end();  
                f.draw(width,0);  
                s = "fbo";  
                bfbo.start();  
                f.readToPixels(pix);  
                pixels_pointer = pix.getPixels();  
                tiempo = bfbo.end();  
                break;  
            case 2:  
                movie.draw(0,0);  
                s = "image";  
                bimage.start();  
                pic.grabScreen(0, 0, width, height);  
                pixels_pointer = pic.getPixels();  
                tiempo = bimage.end();  
                pic.draw(width,0);  
                break;  
            case 3:  
                movie.draw(0,0);  
                s = "texture";  
                btexture.start();  
                text.loadScreenData(0,0,width,height);  
                text.readToPixels(pix);  
                pixels_pointer = pix.getPixels();       //  tmb podemos usar el operador [] para acceder a los pixeles  
                tiempo = btexture.end();  
                text.draw(width,0);  
                break;  
        }  
        ofDrawBitmapString(ofToString(tiempo)+" "+s, 10,15);  
    }  
      
      
    void keyPressed(int key) {  
        if (key == ' ') {  
            caso = (caso > 2) ? 1 : caso + 1;  
              
        }  
    }  
    void keyReleased(int key) {}  
    void mouseMoved(int x, int y ) {}  
    void mouseDragged(int x, int y, int button) {}  
    void mousePressed(int x, int y, int button) {}  
    void mouseReleased(int x, int y, int button) {}  
    void windowResized(int w, int h) {}  
    void gotMessage(ofMessage msg) {}  
    void dragEvent(ofDragInfo dragInfo) {}  
};  
  
  
int main( ){  
	ofSetupOpenGL(width*2,height,OF_WINDOW);  
	ofRunApp(new testApp());  
}  
  
  

benchmarcador.h:

  
#ifndef getPixels_pruebas_benchmarcador_h  
#define getPixels_pruebas_benchmarcador_h  
  
  
#include <tr1/array>  
#include "ofMain.h"  
  
class benchmarker {  
public:  
    unsigned long long prev;  
    unsigned long long sumatoria;  
    unsigned int promedio;  
    std::tr1::array<int, 100> samples;  
      
    benchmarker() {  
        memset(&samples, 0, samples.size());  
    }  
      
    void start() {  
        prev = ofGetElapsedTimeMillis();  
    }  
      
    unsigned int end() {  
        for (int i=0; i<samples.size(); i++) {  
            samples[i] = samples[i+1];  
        }  
        samples[samples.size()-1] = ofGetElapsedTimeMillis() - prev;  
        sumatoria = 0;  
        for (int i=0; i<samples.size(); i++) {  
            sumatoria += samples[i];  
        }  
          
        promedio = sumatoria/samples.size();  
          
        return promedio;  
    }  
      
};  
  
  
#endif  
  
  

sorry for mixing spanish and english in my code, don’t think this will upset anyone though the other day someone did get a bit mad on me for doing so, haha, I guess I’ll eventually decide on using one language for my code :wink: