OpenAI Plays Hide and Seek地nd Breaks The Game!

OpenAI Plays Hide and Seek地nd Breaks The Game!


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. In this project, OpenAI built a hide and seek
game for their AI agents to play. While we look at the exact rules here, I will
note that the goal of the project was to pit two AI teams against each other, and hopefully
see some interesting emergent behaviors. And, boy, did they do some crazy stuff. The coolest part is that the two teams compete
against each other, and whenever one team discovers a new strategy, the other one has
to adapt. Kind of like an arms race situation, and it
also resembles generative adversarial network a little. And the results are magnificent, amusing,
weird – you’ll see in a moment. These agents learn from previous experiences,
and to the surprise of no one, for the first few million rounds, we start out with…pandemonium. Everyone just running around aimlessly. Without proper strategy and semi-random movements,
the seekers are favored and hence win the majority of the games. Nothing to see here. Then, over time, the hiders learned to lock
out the seekers by blocking the doors off with these boxes and started winning consistently. I think the coolest part about this is that
the map was deliberately designed by the OpenAI scientists in a way that the hiders can only
succeed through collaboration. They cannot win alone and hence, they are
forced to learn to work together. Which they did, quite well. But then, something happened. Did you notice this pointy, doorstop-shaped
object? Are you thinking what I am thinking? Well, probably, and not only that, but about
10 million rounds later, the AI also discovered that it can be pushed near a wall and be used
as a ramp, and, tadaa! Got’em! Te seekers started winning more again. So, the ball is now back on the court of the
hiders. Can you defend this? If so, how? Well, these resourceful little critters learned
that since there is a little time at the start of the game when the seekers are frozen, apparently,
during this time, they cannot see them, so why not just sneak out and steal the ramp,
and lock it away from them. Absolutely incredible. Look at those happy eyes as they are carrying
that ramp. And, you think it all ends here? No, no, no. Not even close. It gets weirder. Much weirder. When playing a different map, a seeker has
noticed that it can use a ramp to climb on the top of a box, and, this happens. Do you think couchsurfing is cool? Give me a break! This is box surfing! And, the scientists were quite surprised by
this move as this was one of the first cases where the seeker AI seems to have broken the
game. What happens here is that the physics system
is coded in a way that they are able to move around by exerting force on themselves, but,
there is no additional check whether they are on the floor or not, because who in their
right mind would think about that? As a result, something that shouldn’t ever
happen does happen here. And, we’re still not done yet, this paper
just keeps on giving. A few hundred million rounds later, the hiders
learned to separate all the ramps from the boxes. Dear Fellow Scholars, this is proper box surfing
defense…then, lock down the remaining tools and build a shelter. Note how well rehearsed and executed this
strategy is – there is not a second of time left until the seekers take off. I also love this cheeky move where they set
up the shelter right next to the seekers, and I almost feel like they are saying “yeah
see this here? there is not a single thing you can do about
it”. In a few isolated cases, other interesting
behaviors also emerged, for instance, the hiders learned to exploit the physics system
and just chuck the ramp away. After that, the seekers go “what?” “what just happened?”. But don’t despair, and at this point, I
would also recommend that you hold on to your papers because there was also a crazy case
where a seeker also learned to abuse a similar physics issue and launch itself exactly onto
the top of the hiders. Man, what a paper. This system can be extended and modded for
many other tasks too, so expect to see more of these fun experiments in the future. We get to do this for a living, and we are
even being paid for this. I can’t believe it. In this series, my mission is to showcase
beautiful works that light a fire in people. And this is, no doubt, one of those works. Great idea, interesting, unexpected results,
crisp presentation. Bravo OpenAI! Love it. So, did you enjoy this? What do you think? Make sure to leave a comment below. Also, if you look at the paper, it contains
comparisons to an earlier work we covered about intrinsic motivation, shows how to implement
circular convolutions for the agents to detect their environment around them, and more. Thanks for watching and for your generous
support, and I’ll see you next time!

100 thoughts on “OpenAI Plays Hide and Seek地nd Breaks The Game!

  1. Man what are the implications if we bring into this the hindu idea that we're all the brahman playing hide and seek with itself?

  2. This is what is terrifying about ais. They can't be stopped, and they'll acheive their goals, no matter what, no matter how hard they cheat or hurt others. They are the ultimate mind

  3. It's 2019, we want more than RL that takes millions of attempts to reach an evolutionary equilibrium. Deep learning is now evolutionary algorithms 2.0, but with more model parameters.

  4. Scenario: All Ramps are gone, hiders are hidden and seekers are seemingly unable to do anything.
    Screen starts glitching and the computer starts humming and shaking loudly as the Seekers start a summoning ritual
    Wait we didnt program this in.

  5. i would like to see a neural network AI to play gta5 online and the AI to eventually get to a point where it will "hire" real players to do its bidding

  6. Question. does knowledge transfer between levels? to the grasp the concepts and utility of each object, or learnt specific moves?

  7. What makes the AI try something different after a million tries of the same thing? I could understand something different after each try.l but not way after.

  8. Okay let me try to see how I would always win in hide seek. Probably just mirror the person in who is seeking. Okay let me watch the video and look like an idiot for not finding a better way

  9. Can you make a version of this with the seekers being T-1s and the hiders as T-800s

    Or the seekers being T-800s and the hiders as Kyle Reese, John Conner and Sarah Conner

  10. Please enhance the graphics to the point, where the seekers are terminators and the hiders are human. Then change the detecting vision to shooting.

  11. I once wrote an AI to play chess. I coded the chess game myself, and it turns out there was a bug that was very specific and would cause all of your own pawns to turn into queens. I didn't learn about the bug until my AI found out how to exploit it. It still took me a long time to figure out what was happening, and I basically just had to analyze every situation in which the AI used the bug in order to figure out how it did it.

  12. Can someone tell me how long these simulations take? I didn't expect there to be like 6 billion simulations for this OpenAI stuff.

  13. Irony is that a "regular" data scientist or analytics professional working on real life problems using simple supervised ML and occasionally a bit of deep learning get payed much lesser than these research AI guys.who are effectively writing code to play video games using Reinforcement Learning, After looking at this video, I'm glad I didn't pursue my PhD offer and entered industry instead

  14. Wow… I didn't think AI would replace the job of Professional Video Game SpeedRunners.

    You all better vote for Andrew Yang.

  15. OpenAI is scary. They even beat most of the professional team on their own game. It almost feels like sooner they'll use this for some technological warfare or something

  16. Hmmm… being generous of the average game length..lets say 15seconds….at 100,000,000 rounds…. would be = 47.533147 years
    … they must have started this quite some time back!!

  17. This looks like it could be pretty good for video game AI.

    My question is, since there were millions of generations, how were the conductors able to even notice these generations where the AI broke the game?

  18. that box surfing is just an object fly glitch, that AI hasnt broken the game, it has acted like a player tester would XD

  19. they don't exploit anything. They just use it according to how it was designed. If you think about it, there is actually nothing special about the way they interact with the physics system

  20. A Trillion iterations later, seekers rewrite the code to delete the shared pointer of one of the hiders. The other hider catches on and changes the code to press ALT + F4 and the applications exists with a pop up window or message box with the caption: "See you suckers I'm out of here! …"

Leave a Reply

Your email address will not be published. Required fields are marked *