Real-time audio-visual with StyleGAN 2

I’m looking for good resources about how to make visual, generated by StyleGAN2 reacting to an audio source in real-time. So both audio and video should be driven by Deep Learning algorithms, styleGAN2 for visual generation and audio analysis to explore the latent space.
here’s an example by Mario Klingemann:

Thanks in advance!

Hi, I am not sure that it works in real time, unless you have a really powerful machine.
take a look at

hey there!
last year i was actually doing that with 256x256 images. I used a DCGAN and ofxTensorFlow2. You can check it out here.

Note: there is a branch with the example_liveGAN. It runs with about 24fps on an AMD 3600 CPU. I just found this post looking for a better model… hoping to get a better resolution.

1 Like

oh just checked and with the DCGAN included you can run 512x512 on a RTX2070 super with 26fps. Generator is 100M params.
I trimmed down memory allocated by tensorflow to 10% and it is still running :slight_smile:
next thing i ll check out 720p with fp16 or integer. I may also try out using depthwise-seperable convs to trim down computational costs.