Video Delays, Performance, and External Hardware

Recently I made a piece with oF wherein I controlled a webcam’s image with five different delay sizes using a MIDI violin in realtime. I asked a lot of questions here and you were all of great help. It was a difficult problem, but I was able to do it using ofxTurboJpeg, recommended by @zach , and PureData for the audio, linking the two via OSC.

Aside from the delay, the only effect I used was a blur shader, applied to each of the five images individually.

I’m very happy with how it turned out. However, I’ve hit a technical wall, with it only running at a stable 20FPS on my computer. Trying to do anything else with it in the future will be difficult without further lowering the framerate.

So one idea is to handle the video delay buffers with external hardware to take some of the burden off of my computer.

@fresla , you’ve encouraged me to try using regular video cameras (ie. not webcams) with Black Magic Hyperdecks for the delay. I guess I’d need more than the two recorders you mentioned in your post to realize multiple delays at once?

Another idea is to start learning Rasberry Pi and hook up cameras into that to handle the delay. It might take more time to get working, but I’d acquire a new skill that and the devices would be more customizable.

However, I just now tested the performance of my app to estimate how it would perform if I were to handle the delays with external hardware…

I turned off the recording of JPEGs, so now it’s only loading pre-recorded ones. I also removed the webcam from the code. The resulting highest stable fps is: 25fps with 5 images drawn at once, a modest five frame improvement.

Loading the JPEGs but not applying the blur and not drawing them gives a result of 45fps.

So the question is: Would bringing in 5 pre-delayed images via capture card make the project run substantially faster than using ofxTurboJpeg on my computer?

Also, would such a performance improvement hold true even if using a laptop, which would require a USB-HDMI adapter? (in that case, I could try to send a 4k image into oF and then divide it up there)

I was recently reading about Nam June Paik and Abe Shuya’s Video Synthesizer that they made in the 60’s that processed seven videos inputs at once… of course, it was a gigantic device, but it made me more cognizant of how powerful specialized hardware can be compared to even today’s standard laptops. Just eager for advice on what hardware is best to combine with oF’s powerful creative toolkit.

one question here, when you mention loading the JPEGs you mean a transcoded video from JPEGs or the images themselves to play?

1 Like

The way I’ve done it using only my laptop and a webcam is to convert the webcam input into jpeg sequences, saving a jpeg every frame and loading multiple jpegs every frame (at different indices to get different delays). This seemed like the best option available to me. (and ofxTurboJpeg helped a lot)

As for how I would achieve a similar result using external hardware to improve performance- I’m not sure, whence the thread.

N.b. For sound, I was able to record and play back wav files easily in Pure Data and then send frame indices there via OSC to oF to control the JPEG playback.

what resolution are your 5 jpeg streams, how are they composited in the draw() function, and what is the final rendering resolution/framerate?

1 Like

The images are 1280x720 to help performance. Otherwise they’d be 1920x1080.

The images are alpha-blended with each other (additive)

The output resolution is 1920x1080. The frame rate that I set oF to is the highest stable rate I can get (currently 20). Ie. If I set it higher the fps fluctuates.

Honestly, I’m not sure if getting an external video device will help much here. An external capture card (with a proper camera) will only give you the video feed without overheads and you’ll still need to make buffers out of them - at least in my knowledge I’m not sure how you’d hook up 1 camera to multiple capture devices to create delayed buffers.

This sounds like an optimisation problem to me. First things first, just to get it out of the way - you are running a Release build, right? The differences between a Debug build and Release build can be MAJOR and you wouldn’t be the first one making this rookie mistake, we’ve all been there :slight_smile:

I would be interested to know how you’re dealing with the delayed frames. You’ve mentioned using ofxTurboJpeg (I’ve never used it) but I’m wondering if it would be a smarter approach to use the regular old videoGrabber and get the textures into a vector (I’m assuming you’ve tried ofImage or ofPixels). When it comes to manipulating the delay buffer and using a blur, I’d then look into using a shader for it. This way, the entire image pipeline lives on the GPU (and your gaming laptop should be easily able to handle that).

Finally, even without getting into shaders - if you’re doing multiple operations on vectors of images, are you using references and pointers? If you’re not, that will again add a lot of overhead.

The main pain points of doing image operations on oF are always going in between the CPU and the GPU (which is where ofTexture can help), doing pixel operations (if you’re doing the blurring with ofPixels or something similar).

@ayruos about keeping things in RAM: note that the spec requires delays that are much longer than what one can hold in RAM/VRAM, so it is necessary to hit the disk, in a manner that allows immediate random-access from multiple readers. but I agree that external video devices won’t really help, as anyway the delayed images must end up composited/blurred in the same frame buffer.

@s_e_p I’m curious to see how the code is organized as it should not be a problem to stream 5 * 1280x720, but threading is probably required to make sure the disk access and JPEG (de)compression are handled out of the update-draw cycle. can you post your project somewhere? also: what is the longest delay you’re aiming for?

adding on to above point,

one other thing that would be helpful to know is exactly where the delay is – I’d use a profiler first, and then perhaps look at doing only ram → gpu and only disk to ram as tests to see if I can figure out what the bottleneck is. I often times will just comment out Draw and see what update is doing in terms of performance, or reduce what happens in update to see what Draw is doing in terms of the overall speed.

in general the slowest things will be (IMO) going from cpu to gpu (streaming 5x 1280x720 pixels to the gpu) and reading from disk. Reading from disk perhaps could be helped by threading – it’s hard / impossible to thread gpu operations as OpenGL is single thread.

I think knowing exactly per frame, how much time is spent doing what can help you make choices about what to optimize. for example, if you learn it’s mostly just uploading the image data to gpu, perhaps there are optimizations that can be made using a more compressed format, that you can decompress on the gpu. If it’s in reading from disk, perhaps threading and/or some optimizations on format can help there too.

lots of good tips from @zach

Knowing where the slowdown is taking place will help you put your energy into optimizing the things that will make a difference.

It is also good to not just assume A will be faster than B because it seems logical.

For example with PNG vs JPG: Png is typically larger so could end up taking more time to read from disk but this might be offset by Jpeg needing more cpu processing to decompress. It can be really hard to say which one will be faster without testing.

I would also look into the ofSaveImage flags:

You might find that you don’t need OF_IMAGE_QUALITY_BEST and a lower quality will be faster to save and load.

And the ofImageLoadSettings freeImageFlags which might give options for loading a little faster:

See:
https://documentation.help/FreeImage.NET/T_FreeImageAPI_FREE_IMAGE_LOAD_FLAGS.htm

and JPEG_FAST

I would probably only look at threading if you find most of the cpu usage is in the load pixels from memory stage vs the file read stage and I would probably would try everything else first before implementing threading as it always is more tricky than it first seems.

But as Zach said, first try and isolate where most of the time is being spent. Either with manually timing different sections of code or using a profiler ( only know how to do this on macOS ).

Hope that helps!
Theo

Yeah and now with SSD drives it is really fast to write uncompressed files. I even wrote an uncompressed tiff writer to try to save image sequence as fast as possible, but not raw. it only saves RGB888 files but it was the fastest way of saving image sequences for me. the problem is it takes a lot of disk space.