Hardware recommendation for reading from GPU back to CPU (fbo.getpixles())

I’ve built an application that runs very quickly (~50fps) on my 2013 MBP 15" with a nvidia 650M GPU and core i7 CPU. I’m using ofxFastFboReader to pull an fbo back to the cpu every frame (using ofxFlowTools, which is beautiful). I’ve run this same application on an older mac mini (2011 with intel hd graphics, core i5) and couldn’t get above 9fps.

I know nothing about how the fboreader works. What hardware is responsible for making the ofxFastFboReader run? If i am spec’ing out a new computer to run this application for an instal what should i be looking for? Beefy GPU? Beefy CPU? Special PCI bus? I’d like to keep costs down, so if something with a big GPU but smaller cpu works, that would be nice.


ofxFastFboReader uses PBO to speedup pixel reading, so it depends on the GPU.
To learn how it works, you can have a look here: https://www.opengl.org/wiki/Pixel_Buffer_Object

there’s an easy way to avoid stalling the gpu when reading back from it, just wait till the frame is ready and then read it back. you can even do it on a different thread which would avoid any possible stall. to do that you usually use a ring buffer of PBO’s instead of 1 or 2 as ofFastFboReader does.

that way you give time to the gpu to write the frame and read it back at the cost of introducing some latency.

there’s an example on how to read back on a different thread on the nightly builds in gl/threadedPixelBufferExample. that uses only 2 buffers but should be relatively easy to modify it to use more than 2


Thanks for the recommendations guys. I did some exploring and reading and testing… it looks like ofFastFboReader was working as well as a ring buffer PBO (i think fastFBOreader uses 3 buffers by default, although i might have misunderstood). It looks like this application just requires a lot of GPU beef. The ofxFlowTools rendering requires horsepower, as does the PBO read back stuff. We ended up building a new machine with a discrete GPU.

I want to revive this thread, I am working with a very large texture I need to read to pixels, 32216*960. I am using a 1080ti and drawing stuff is easy and there is plenty of GPU power left. However reading to pixels is slow. I have tried the fastfbo reader as well (actually I am working with ofxNDI which does move data from the GPU to the CPU but very efficiently, much faster than anything else I tried. If I can make this work I can get great hardware but I am trying to understand exactly where the bottleneck is.

When I monitor my hardware I am only using 48% of my GPU and 2.8 gig of 11 gig of video memory. My cpu utilisation is at about 35% on every core (of 8).

This system is relatively old, with the 1080ti running in an pice 3 slot and a 5 year old motherboard.

What feature of hardware (Motherboard, cpu, gpu) would accelerate the moving of data from the GPU to the CPU?

1 Like

The speed here has nothing to do with the GPU cores but with the bus that connects the ram and the gpu, so the pci bus and how much of them the card use. New cards will be faster of course but in general downloading is not super optimized since it’s not very common.

In general using PBOs to download from the gpu’s memory to ram is not faster, it’s just asynchronous. instead of having to wait for the download to finish this addons start the download and return immediately getting the results the next frame which masks the download speed by introducing latency.

A possible solution, unless you need short latencies, is to wait more than one frame which is usually what this addons do. Or even query the pbo to see if it’s ready and continue if not. doing it on a different thread and sending a signal or even the frame on a thread channel whenever it’s ready can also be a solution

Thans for the tip, I read a little more that hinted the speed of the GPU ram can also help. I had my GPU connected at 8x (Pcie 3) as I have another card inside the machine that shares resources. Pulling out the other card and running at 16x did nothing, so it seems to be the card that is limiting me, not the conneciton speed to the motherboard (unless there is a direct bottleneck in my motherboard).

Latency is not an issue, and the addon i am using has a dual buffer using PBOs, I will try increase this to a quad buffer and see if I can get more speed, as latency in this case is not important.