Hello all,
I’m working on building a sort of algorithmic graphic-match generative video editor (don’t worry, I have a whole bunch of paragraphs that unpack that statement better, if you’re interested) that edits live from a bank of videos, and @bakercp, my instructor has led me to ofxTSNE
(via @genekogan and the works of Laurens van der Maaten). I just wanted to share the interesting thing that happens when you take an image sequence from a video, or a few videos, and then plug them into the ofxTSNE example-images
.
Here is an image sequence from a video of some emergency vehicles leaving a station panning from a fixed position:
(sorry, I lost the links to the youtube videos I pulled these examples from . I renamed them without thinking about that…)
Pretty cool. The only problem is that you have to use a lot of images (frames per second) from the video to get these “pathways” that the visual characteristics sort of carve out. It’s very time-consuming to compile, then fairly heavy on RAM & CPU when only running everything in a few threads. I’ve tried with lots of images from lots of videos. The emergency vehicles example above is 5 frames taken per second from 30 seconds: not a lot.
Here’s an example from a couple different videos that might cross “pathways.”
And I said might, because to me these videos are very similar; they are both somewhat static shots of a large crowd of people sitting on the floor of a room, even facing the same way. Maybe there is a fair difference in general color palette, but nonetheless there is a striking graphic-match between the two. But, I find that [because?] this tSNE is particularly built to differentiate, to the point where it will spread these pathways a part from each other depending on how many things it has to differentiate between, it would take a third video of completely different visual characteristics to persuade the tSNE to put these two videos a little closer to each other.
Let’s test:
Nope.
I am writing this post because I’m not really sure what I’m doing, and I’m not sure that I fully understand what the tSNE is supposed to be doing either. I have a general idea, and that idea seems to be very relevant to the vision of my project, but I feel as though I’m overlooking something, and it may be entirely subjective, or based on the actual chosen videos.
Either way, thought I’d share what’s in my little pocket of research.