The eternal question about video playback performance but this time with some background and some constraints. I’m interested mainly in high performance image sequence save/load from/to disc.
By high performance I mean:
Capable of storing high resolution frames e.g.4K or
high fps e.g.+1000 fps at smaller resolutions.
Mostly interested on saving at lossless compression, png, tiff, LZ4, binary.
Loading in Jpeg 2000 would be good. I’m considering a worker thread to convert lossless to jpeg 2000 in case the sequence has to be displayed.
No audio sync, I’wont use audio for now.
Starting on Windows and porting afterwards.
Displaying in realtime of course means using only 60fps out of 1000 (or any other high fps used). But it could be interesting to playback at high fps without display, just for processing purposes.
Scrubbing will suffer since we are reading from disk. Not sure what delay is expected.
The recording part is more tricky, dropping frames is not a good thing, so first we need to store in memory and then save to disk as fast as possible. If the sequence is very long we might get into the situation where no more memory is available, then start dropping frames and keep a log.
Things i’ve tried:
So far I have been using my own version of ofxImageSequence (by @obviousjim and Co) to load from memory, import from disk, export to disk, and playback (updating a single texture…).
This works very well for a sequence that has been completetly stored in memory. The problem is that storing long sequences in memory is not a good idea.
I have also been trying ofxVideoRecorder (@tims) under windows (thanks to @DomAmato ) with very good results. Saving is pretty fast when using png sequences. I haven’t done bench-marking tho.
Right, the eternal question. I guess there will never be a consensus as high-performance varies in the different scenarios there are. I’ve used in the past what ofxFastIOImage does and it works really well. The only problem is that you are tied up to disk io speed. When I did this it was I needed fast image io along with alpha channels and directly loading the PGNs wasn’t fast enough (I had to use a macmini with a rotating hard drive). There must be an optimal point between disc i/o and decoding/encoding.
For writing and not dropping frames, maybe using a thread pool could be a good idea.
these are my 2 cents.
all the best
I’m the original author of the ofxHPVPlayer addon. I devised this file format and player to have a lightweight and transparent way of playing high-speed & high-fps video files. In doing so, I’ve come across a lot of the conditions you point out in your post.
When playing video files, there is always a balance between disk-capacity (e.g. continuously reading chunks of data since we can’t buffer gigabytes of data) and processing power needed to process the video frames. There is a reason you won’t easily find a TIFF videoplayer. Using 4K TIFFs will amount to enormous file sizes per frame which will require very very fast disk reading speeds. On top of that, you will need to pipe this uncompressed RGB from your host memory (RAM) to video memory (GPU). Doing this for large frame sizes is inefficient. If you’re only interested in very small frame sizes e.g. 320x240 this becomes less of an issue.
This is why compression schemes are used. Efficient video compression schemes are schemes that can be easily (en)decoded on the GPU or other dedicated parts of your machine. Since video textures at some point most likely will have to be displayed on screen, this makes sense. The path from video memory to the frame buffer is shorter compared to that from RAM. And because frames are compressed, the impact on the disk is significantly lower.
With a bit of generalisation, one could distinguish two types of compression families:
The first involves complex motion-compensated compression techniques such as H.264, HEVC, VP9, … These schemes result in small file sizes but require dedicated decoding hardware to be able to decode in real time. Frames are not always individual entities and decoding a video sometimes requires analysing previous and successive frames. As such, scrubbing requires more work. Lots of nowadays computer devices focus on these techniques since data-sizes are small, are easily network transportable and decoding support is often directly built in to chips (e.g. Raspberry Pi VideoCore)
The second involves frame individual compression techniques, where each frame is one individual entity which is en/decoded separately. These schemes tend to rely on texture decompression built-in to graphics cards. Some of these lossy compressions schemes (DXT) are already existing from the 90’s with underlying technology already developed in the 70s. Decompression happens very efficiently on the GPU but file sizes are bigger compared to H.264, … Some of the addons you sum up focus on these desktop playback techniques: HAP, HPV, DXV (Resolume). All of them do compress each frame even more with CPU bases techniques such as libsquish, LZ4, … File sizes are still big, but with nowadays disk sizes and prices this becomes less of an issue. File sizes up to 8k @ 60fps are definitely achievable.
For your use-case I would definitely try-out HAP or HPV first before building something yourself since they focus especially on your use-case. They are less suited for real-time encoding but I’m not sure what your trying to achieve there. And yes, they use lossy compression, but you have to ask yourself whether this is really an issue. You get quite good quality with DXT5/HapQ. Plus, it’s part of the trade-off discussed above.
I’m not 100% sure, but I don’t believe PNG (DCT) or JPEG2000 have the same broad hardware decoding support and as such (at least for large frame sizes) are not your best bet to work with.
BTW, memory mapping is not a Windows only technique. It’s more of computer programming concept where the same address space is used to address memory and I/O devices. This concept was then translated to different platform APIs allowing to apply this technique.
Thank you for your reply! I think at first i will go for something similar to ofxFastIOImage using some lossless compression with squash. I remember i did some trashy tests 2 years ago doing something similar to ofxFastIOImage, so i will bring them back and compare that dirty code with the addon. I remember i used ofBuffer for writing strange image formats into disk. But i didn’t notice that saving like that seems to be much faster than using ofSaveImage.
Posts like this will help many people to understand more about this topics. Also your explanation gives many lights on the importance of your addon. Thank you for such a really good addon.
I should have emphasized that it is important for me to keep a copy of a lossless image. The main purpose is to dump as many images as possible from a device as they are, or at least compressed in a way that allows me get an identical copy of the source image (lossless). So for now I’m thinking about having two separate storages of the same images: the heavy lossless (not used for display) and a lossy using your addon. It is interesting to do it like that because the lossy compression can be done afterwards, none-realtime then the images could be happily displayed at high fps while keeping the original lossless ones for other purposes. About the motion-compensated compression schemes so far not so interested on that yet, but it could come up for data portability.
Ah and thank you very much for pointing at the memory mapping APIs. I’m still wondering if I should give it a shot or if this is not useful at all today…
If you’re targeting Windows/Unix on a system with a recent Nvidia and if you really want to go all the way on this: you might want to check the Nvidia Video Codec SDK. This is an API for the built-in HW encoders/decoders inside an Nvidia card. They support all kinds of formats, of which some are lossless. I know H264 and HEVC both have lossless options. I’ve worked with this API and I found it to be the most powerful and efficient option out there, both for EN- as DE-coding.
Now this is not for the faint hearted but it certainly pays off …
This year I’ve tried to record video output as fast as possible and written an uncompressed tiff writer, which used in threads have similar performance as ofxFastIOImage.
I’ve tried to record everything in a Ramdisk to increase speed but in my system this is very similar to write to SSD.