This Controllable AI Dreams Up Images For You

This Controllable AI Dreams Up Images For You

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. As machine learning research advances over
time, learning-based techniques are getting better and better at generating images, or
even creating videos when given a topic. A few episodes ago, we talked about a DeepMind’s
Dual Video Discriminator technique, in which, multiple neural networks compete against each
other, teaching our machines to synthesize a collection of 2-second long videos. One of the key advantages of this method was
that it also learned the concept of changes in the camera view, zooming in on an object,
and understood that if someone draws something with a pen, the ink has to remain on the paper
unchanged. However, generally, if we wish to ask an AI
to synthesize assets for us, in many cases, we’ll likely have an exact idea of what
we are looking for. In these cases, we are looking for a little
more artistic control than this technique offers us. So, can we get around this? If so, how? Well, we can! I’ll tell you how in a moment, but to understand
this solution, we first have to have a firm grasp on the concept of latent spaces. You can think of a latent space as a compressed
representation that tries to capture the essence of the dataset that we have at hand. You can see a similar latent space method
in action here that captures the key features that set different kinds of fonts apart and
presents these options on a 2D plane, and here, you see our technique that builds a
latent space for modeling a wide range of photorealistic material models that we can
explore. And now to this new work. What this tries to do is find a path in the
latent space of these images that relates to intuitive concepts like camera zooming,
rotation or shifting. That’s not an easy task, but if we can pull
it off, we’ll have more artistic control over these generated images, which would be
immensely useful for many creative tasks. This new work can perform that, and not only
that, but it is also able to learn the concept of color enhancement, and can even increase
or decrease the contrast of these images. The key idea of this paper is that this can
be done through trying to find crazy, non-linear trajectories in these latent spaces that happen
to relate to these intuitive concepts. It is not perfect in a sense that we can indeed
zoom in on the picture of this dog, but the posture of the dog also changes, and it even
seems like we’re starting out with a puppy that grows up frame by frame. This means that we have learned to navigate
this latent space, but there is still some additional fat in these movements, which is
a typical side effect of latent space-based techniques and also, don’t forget that the
training data the AI is given also has its own limits. However, as you see, we are now one step closer
to not only having an AI that synthesizes images for us, but one that does it exactly
with the camera setup, rotation, and colors that we are looking for. What a time to be alive! If you wish to see beautiful formulations
of walks…walks in latent spaces, that is, make sure to have a look at the paper in the
video description. Also, note that we have now appeared on instagram
with bite-sized pieces of our bite-sized videos. Yes, it is quite peculiar. Make sure to check it out, just search for
two minute papers on instagram or click the link in the video description. Thanks for watching and for your generous
support, and I’ll see you next time!

36 thoughts on “This Controllable AI Dreams Up Images For You

  1. Has anyone stitched these generated images together to create a video? Id love to see what it looks like because it's hard to tell frame by frame.

  2. Another informative Paper, Hope soon I can do research something like this with a proper equipment and environment .. soon

  3. Except for the movements in 3D space, cant everything else directly be performed on the image as post processing, like contrast enhancement, shift X, shift Y, zoom etc. ?

  4. I do not see the point of an AI capable of increasing contrast on an image, since we already have efficient algorithms to achieve this. Could someone explain, please 🙂 ?

  5. Awesome, would love to see the next iteration that gives you more complex control, such as control over what’s in the picture

  6. Im starting to play with 2D game development, and making the sprites takes a lot of work. Maybe it can help in this task too.

  7. We're probably going to need yet another Ai to help navigate these latent spaces. There are so many potential unwanted changes tied to the concepts trying to be defined, that navigating the paths that represent them out is really hard. It's difficult to even define what the path is. Just in the examples you see here, it's not just the dog getting older, but the "zoom" equating to the laptop screen getting bigger. Because how much space the laptop occupies in the final image is tied to both the size of the laptop, as well as the position of the camera. And the "brightness" affecting the time of day in the city shot. While it's true that day is brighter than night (citation needed) it's not the intended result of manipulating the brightness in an image.

  8. i assume that if the ai has never been asked before to care about how close an object is, but only to be able to describe what is shown (not how close etc.) or to create a picture that shows a specific thing (no matter how close etc.) , then all its trials of finding relevant paths within this space will not really create a perfect way of only changing the zoom.

  9. I see a huge failure in future internet searches as confirmation bias makes every conceivable idiotic idea photorealistically true, and curses paired with literalism lead to disaster after earth-ending disaster.

  10. I love how all of the videos look normal out of the corner of your eye, but once you actually look at one you go "wait that isn't a horse"

Leave a Reply

Your email address will not be published. Required fields are marked *