AI Makes Stunning Photos From Your Drawings (pix2pix) | Two Minute Papers #133

AI Makes Stunning Photos From Your Drawings (pix2pix) | Two Minute Papers #133

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. In an earlier work, we were able to change
a photo of an already existing design according to our taste. That was absolutely amazing. But now, hold onto your papers and have a
look at this! Because here, we can create something out
of thin air! The input in this problem formulation is an
image, and the output is an image of a different kind. Let’s call this process image translation. It is translation in a sense, that for instance,
we can add an aerial view of a city as an input, and get the map of this city as an
output. Or, we can draw the silhouette of a handbag,
and have it translated to an actual, real-looking object. And we can go even crazier, for instance,
day to night conversion of a photograph is also possible. What an incredible idea, and look at the quality
of the execution. Ice cream for my eyes. And, as always, please don’t think of this
algorithm as the end of the road – like all papers, this is a stepping stone, and a few
more works down the line, the kinks will be fixed, and the output quality is going to
be vastly improved. The technique uses a conditional adversarial
network to accomplish this. This works the following way: there is a generative
neural network that creates new images all day, and a discriminator network is also available
all day to judge whether these images look natural or not. During this process, the generator network
learns to draw more realistic images, and the discriminator network learns to tell fake
images from real ones. If they train together for long enough, they
will be able to reliably create these image translations for a large set of different
scenarios. There are two key differences that make this
piece of work stand out from the classical generative adversarial networks:
One – both neural networks have the opportunity to look at the before and after images. Normally we restrict the problem to only looking
at the after images, the final results. And two – instead of only positive, both positive
and negative examples are generated. This means that the generator network is also
asked to create really bad images on purpose so that the discriminator network can more
reliably learn the distinction between flippant attempts and quality craftsmanship. Another great selling point here is that we
don’t need several different algorithms for each of the cases, the same generic approach
is used for all the maps and photographs, the only thing that is different is the training
data. Twitter has blown up with fun experiments,
most of them include cute drawings ending up as horrifying looking cats. As the title of the video says, the results
are always going to be stunning, but sometimes, a different kind of stunning than we’d expect. It’s so delightful to see that people are
having a great time with this technique and it is always a great choice to put out such
a work for a wide audience to play with. And if you got excited for this project, there
are tons, and I mean tons of links in the video description, including one to the source
code of the project, so make sure to have a look and read up some more on the topic,
there’s going to be lots of fun to be had! You can also try it for yourself, there is
a link to an online demo in the description and if you post your results in the comments
section, I guarantee there will be some amusing discussions. I feel that soon, a new era of video games
and movies will dawn where most of the digital models are drawn by computers. As automation and mass-producing is a standard
in many industries nowadays, we’ll surely be hearing people going: “Do you remember
the good old times when video games were handcrafted? Man, those were the days!”. If you enjoyed this episode, make sure to
subscribe to the series, we try our best to put out two of these videos per week. We would be happy to have have join our growing
club of Fellow Scholars and be a part of our journey to the world of incredible research
works such as this one. Thanks for watching and for your generous
support, and I’ll see you next time!

100 thoughts on “AI Makes Stunning Photos From Your Drawings (pix2pix) | Two Minute Papers #133

  1. If they get something like this to work for videos, then it would be difficult for people to determine what's real and what's fake. People have already been fooled by photoshopped images. How long until people start manufacturing "evidence" for courts or generating "fake news" footage?

  2. Considering the amount of work you have to do for each video, it's incredible how many videos you upload. Just two words: Thank you!

  3. Have you ever heard of HTM(Hierarchical Temporal Memory)?
    It's kind of like a neural network but much better.
    The creators of HTM tried to mimic the neocortex and it works really well.
    They claim that it's the real way of true intelligence.
    I just think it's incredibly awesome and I don't think it doesn't get much attention it deserves.
    After all, isn't the neocortex that makes human intelligent, right?
    p.s. Sorry for my bad English ;(

  4. So I was sitting in Spanish class the other day when I got an idea, and I want another take on it.
    You know how sites like Google translate always churn out characteristically faulty and unreliable translations? Well, would it be possible to improve machine translations with the aid of a general AI?
    How I imagine it would work would be similar to the other general AIs discussed on this channel. It would start out with the broken machine translation first, make its evolved changes to it, and then see how close it is to the same sentence translated by a professional translator (there are hundreds of books translated from English to Spanish every year, so a large training data set wouldn't be that hard to acquire I imagine.) The closer the edited bad translation is to the proper translation, the more fitness points the AI is rewarded with.
    Would this even work, or does language have too many subtle complexities for human-made code to be improved upon? I watch the papers featured here and especially in the image generation ones the neural networks' understanding of how RGB values come together to make recognizable images seems impossibly deep for a computer, surely this same understanding can be reached in regards to language? Or maybe I'm missing something in my ignorance?
    Anyway, fantastic channel man, it isn't often that someone finds such a small niche and pours so much effort into filling it!

  5. I like two minute papers, but some of these papers may end up strictly becoming fun apps on smart phones; never truly lifting off as with the case of VR.

  6. they look like they about to solve the riemann hypothesis, and I look like Ive just figured out how to draw phallic objects in the sand, by writing a temperature plotting thing. awesome vid tho

  7. Thanks for the evaluation AND for supplying the links! I'm already on Github and have set up my account and learning code branches and repositories (sounds like a bank!) and as the process of learning and sharing information is growing exponentially, my hats off to you for sharing the latest of your discoveries and insights, and not waiting until your patent comes through to share what you have learned like so many others do. To create a new method is inspired, to share with the world, divine.

  8. Maybe it could be used for mouth movements in animation so eg. Pixar can dub content in each language and have the software animate mouths and have them work properly in all languages.

  9. Wish I could UL my drawings of my cat instead of redrawing w a mouse.

  10. imagine video game graphics like peoples faces in games being drawn in this fashion. That would be soo good.
    Or cartoons and if the voices are artificially crafted as well the cartoons can run for decades without needing to higher voice actors.
    Imagine TV shows like The Expanse or Stargate but with no actors.
    Why use your voice on the radio when you can have a machine generate another one.
    this is great stuff.

  11. "Open the pod bay doors, HAL!" This is borderline terrifying to me. I don't want to sound backward minded, but I can see no way all this progress wouldn't go wrong. Society is not flexible enough and neither smart enough (as a whole) to keep the pace and adapt to AI, which will soon be able to evolve based on it's own decisions. The tool will become smarter than the user. Change my mind. Please.

  12. Its really awesome!!!
    Can it be used for human beings, so that it can be helpful for the crime branch to identify criminals by the sketch drawn?

  13. WHEN the AI takes over and conquers the human race, we'll be lucky to be kept as pets instead of "a bundle of raw materials". This is a bad path to follow. I just hope the AI doesn't have this narrators horribly irritating voice.

  14. Yeah, tried the demo, doesn't do anything.
    You draw what you are supposed to, it elaborates and then a white rectangle appears.

  15. AI will be able to automate creative jobs much faster than people think. We need to get ahead of the transformation before drastic changes cause damage in society. Andrew Yang is the only candidate running right now who is informed on automation and its effects.

  16. This is just like the photoshop tool that clones a texture, the healing brush, but more advanced. I mean, this is not "artists died", it's more like "great, we artists can design more and better things in less time". It's just a tool.

  17. Video games were never hand crafted, the computer was always there and most of the textures were generic and tiled textures. This will be more more close to hand crafted than the old games if you consider the old games somehow hand crafted. Also, there is things that doesn't exist so artists will be creating those things. Then the price factor, you might have the technology but if most studios is just cheaper to so something else even with many artists working on it, that will be better.

    For me in the end this will be like a book. The artists will be the writer. The book will be the story generated with images, a 3d game, a world full of life, but still a narrative of that world will be there and artist driven. In my opinion Machines will never replace artists because IA is not really possible and even if it was at that time no fucking one will need to work at that point so i'ts irrelevant, Africa would be rich like the rest of the world, if not that means artists are still needed for political reasons.

    We are 200 years later saying "machines will replace humans, be scared" this started even before French revolution and nothing happened, in the end machines are just tools to speed up things or make them better.

  18. Data usage is not imagine.Mix and Match is not imagine. It can makes good pictures but Art is more than that .Art is "spirit" and now AI can not do that.

  19. c'mon man, there's not "tons" of links in the description. There's a good bit, sure, but the way you emphasized "tons" was a bit of an exaggeration, wouldn't you say? I mean, if I clicked on the description and saw a wall of links, from the top of my screen to the bottom, then I would agree with your "tons" claim, but that wasn't the case. You should probably say something like "check the description, I put less than a ton but more than a few links in there for you" just to be safe. Let's not get bogged down with this "tons" business.

    Cool vid btw, I like cats.

  20. why everyone's working with images but seemingly no one uses theses for sound? i want an algorithm that upsamples low quality sounds, another that is able to isolate specific sounds from a jumbled mess (like a voice from a crowd, or a single instrument from a song), or maybe real time conversion of input speech into an entirely different voice, not just altering the input with filters, but recreating the speech from scratch using an AI. so many cool things to do with sound and no ones does any of them :

    i mean, except for the synth voices that have gotten so convincing over time. but that's text-to-speech. i want speech-to-speech, preserving (or altering) intonation and all the nuances.

  21. This could be a big deal for 2d animation. Draw some line art and a detailed illustration, and you might get extremely detailed frames!

Leave a Reply

Your email address will not be published. Required fields are marked *