Image sequencer of large images has slow framerate

First time poster, thanks to everyone who contributes.

My machine specs:

  • 4ghz i7 with 4 cores (can get a 6 core unit if needed)
  • Samsung 850 Pro SSD
  • 32gb RAM (can get 64gb if needed)
  • (2) nVidia Quadro M5000, 8gb memory each (running Mosaic)
  • nVidia Sync Card
  • (8) 55" 1080p displays, side by side in portrait mode for a resolution of 8682x1920 (42px are bezel correction)
  • Windows 7 Pro (can go Windows 10 if it helps)

The gist of my project is an image sequence that has 720 images that circle around a center point, as if you were walking around it looking at the center point. Two images for every degree. At the next phase, this will be controlled with a rotary encoder, but for now I just need to get it working in a continuous cycle at 60fps.

Where this goes from a very simple task to more complicated is that each image is 8682x1920, compressed JPG. I have determined that each image decompresses to about 50mb, so the total is around 36gb. I plan to have this lazy load files and swap textures in and out of the graphics card as needed, but for now I just need a solid proof of concept at full resolution. Here is my workflow:

Loading in a much smaller set of images, maybe 180 images (3 seconds of footage), which should fit nicely in both RAM and the two Quadro’s 16gb combined memory. When I do this, the playback peaks at 60fps for a few seconds, then will throttle down to ~20-24fps. If I toggle the direction of the playback (it goes forward and backward) it will speed back up for a few seconds, then drop again. I’m storing all of the images in a vector <ofImage> and not deleting or modifying anything after they are in there. I’ve also tried storing them all in a map <int, ofImage> and that didn’t change anything, good or bad.

If I can get it working with enough images that should clearly fit inside of memory, I will incorporate setUseTexture and swap in and out as needed (I’m assuming that’s how that works), but for now I just need it working at its most basic. I have working versions at smaller file dimensions in both OF and an HTML5/canvas version, so I know the code is at least decent.

I can post code samples if necessary, but they aren’t really anything complicated. Honestly, it’s the complicated stuff that I don’t know. My experience with C++ is very intermediate, a notch above beginner.

Thanks again.

1 Like

it’s not very clear if you are loading all the images at the beginning or while playing them back, in each case the bottleneck might be in different parts

Thanks @arturo. In this test I am loading them all up front (ex: images[i].load('foo.jpg')), and I think that should be ok, since I want them all loaded before playback begins. Do you prefer a different way?

Image compression is for storage. When you load images with the same specifications, they all use the same memory space, regardless if they are JPG, PNG or TIFF.

Considering you have so many and so large images, maybe you can save some loading times by storing them on an uncompressed format (this will be important on my third suggestion). I’ve never run speed tests in OF though, it’s just a thought.

Since you are not going to manipulate the images on the CPU, I think you could just use a vector of ofTexture. It would free you a load of memory.

With something as heavy as this, I think is mandatory to load the images dynamically. Please check the example in examples/addons/threadedImageLoaderExample. I believe is something you can build on.

One last thing, I don’t know if OF is optimized for double GPUs. (@arturo ? ). Because if it’s not, then you should also check the performance with only one GPU. As an example, even AAA games that are not optimized for that, run way worst.

Hope this is helpful, keep up the good work and have fun! :slight_smile:

yes i’ve never worked with 2 cards but I would expect is not so simple as adding the memory of both. at some point the card that is doing the output will need the texture anyway and the transfer from one card to another might be slowing things down.

the problem you describe where playing backwards makes it faster for some seconds sounds like the memory is full so it needs to move textures from the graphics card to the computers memory an load the next one from ram to the graphics card, but when playing backwards the next few textures are already in the graphics card so it doesn’t need to move memory around for a while.

I would try to preloaded images in ram without textures, using ofloadimage to ofpixels. then when you need a to show an image upload it always to the same texture. you could use a pbo to upload asynchronously if that’s still slow but I don’t think so.

as a first step perhaps try loading the 180 to ram and check the memory usage with the task manager to make sure there’s no swapping to disk, then try to load dynamically from disk

loading uncompressed at that size is impossible unless you have a raid of probably more than 4 ssd and even with that you would need to optimise reading from disk a lot, plain ofloadimage would still be too slow I can imagine.

finally, if you can just use a video instead of an image sequence that might save you some headaches. the default video player in window is slow but ofxgstreamer should playback at that size without problem

Thank you @hubris and @arturo. I will try implementing your suggestions and report back.

@hubris - regarding ofxThreadedImageLoader, I have a version that uses that to use all 720 images, pre-caching 50-100 images in each direction, removing images that aren’t needed anymore, but i find that every time loadFromDisk() is called, it appears to block. I had it running in update(), so it drastically slowed the framerate. I also tried calling it every 10/20/30 ticks, and it had the same end result of blocking during that time. Is putting it in update() not the preferred place? If not, do you have a recommendation?

Again, thanks to both of you for helping with this, OF has been a great learning experience and very fun.

I had to do this on my iMac at home tonight, so my testing abilities are a quite a bit lower, but I had great success using a vector <ofPixels> and loading them to a single instance of ofTexture. Playback was nice and smooth once all of the pixels were loaded.

However, I still ran in to the same issue as using ofImage, which is, I couldn’t load all of the pixel instances before running out of memory. And using my method of pixelVector[unusedPixels].clear() and calling ofLoadImage(pixelVector[newPixels], 'filepath') on update() caused the same slow, blocking affect as ofImage.load() and ofxThreadedImageLoader.loadFromDisk() did.

I’m trying to understand how ofxPBO works, but I don’t get how it loads from disk, and how it would do it asynchronously. Would I need to call it from an ofThread?

PBOs would be used to upload the the graphics card faster not to load from disk so i don’t think you need that. you need to do clear and ofLoadImage on a different can probably do the loading on a different thread and then use an ofThreadChannel to send the loaded pixels back to the main thread