AIs Are Getting Too Smart – Time For A New “IQ Test” 🎓

AIs Are Getting Too Smart – Time For A New “IQ Test” 🎓


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. In a world where learning-based algorithms
are rapidly becoming more capable, I increasingly find myself asking the question: “so, how
smart are these algorithms, really?”. I am clearly not alone with this. To be able to answer this question, a set
of tests were proposed, and many of these tests shared one important design decision:
they are very difficult to solve for someone without generalized knowledge. In an earlier episode, we talked about DeepMind’s
paper where they created a bunch of randomized mind-bending, or in the case of an AI, maybe
silicon-bending questions that looked quite a bit like a nasty, nasty IQ test. And even in the presence of additional distractions,
their AI did extremely well. I noted that on this test, finding the correct
solution around 60% of the time would be quite respectable for a human, where their algorithm
succeeded over 62% of the time, and upon removing the annoying distractions, this success rate
skyrocketed to 78%. Wow. More specialized tests have also been developed. For instance, scientists at DeepMind also
released a modular math test with over 2 million questions, in which their AI did extremely
well at tasks like interpolation, rounding decimals, integers, whereas they were not
too accurate at detecting primality and factorization. Furthermore, a little more than a year ago,
the Glue benchmark appeared that was designed to test the natural language understanding
capabilities of these AIs. When benchmarking the state of the art learning
algorithms, they found that they were approximately 80% as good as the fellow non-expert human
beings. That is remarkable. Given the difficulty of the test, they were
likely not expecting human-level performance, which you see marked with the black horizontal
line, which was surpassed within less than a year. So, what do we do in this case? Well, as always, of course, design an even
harder test. In comes SuperGLUE, the paper we’re looking
at today, which is meant to provide an even harder challenge for these learning algorithms. Have a look at these example questions here. For instance, this time around, reusing general
background knowledge gets more emphasis in the questions. As a result, the AI has to be able to learn
and reason with more finesse to successfully address these questions. Here you see a bunch of examples, and you
can see that these are anything but trivial little tests for a baby AI – not all, but
some of these are calibrated for humans at around college-level education. So, let’s have a look at how the current
state of the art AIs fared in this one! Well, not as good as humans, which is good
news, because that was the main objective. However, they still did remarkably well. For instance, the BoolQ package contains a
set of yes and no questions, in these, the AIs are reasonably close to human performance,
but on MultiRC, the multi-sentence reading comprehension package, they still do OK, but
humans outperform them by quite a bit. Note that you see two numbers for this test,
the reason for this is that there are multiple test sets for this package. Note that in the second one, even humans seem
to fail almost half the time, so I can only imagine the revelation we’ll have a couple
more papers down the line. I am very excited to see that, and if you
are too, make sure to subscribe and hit the bell icon to not miss future episodes. Thanks for watching and for your generous
support, and I’ll see you next time!

43 thoughts on “AIs Are Getting Too Smart – Time For A New “IQ Test” 🎓

  1. You mention that your sponsor is cheaper than their competitors. Is that before or after we've paid lawyers for an opinion on the kind of risk we are taking by using NVidia's consumer graphics cards in the cloud, which they don't license their CUDA and cuDNN drivers for, as far as I know?

  2. I must have missed the previous episode about IQ with those great results. I suggest you add them to the description

  3. Let's take a regular 'human' IQ test given to a 13yr old with NO prior exposure to an IQ test. Putting aside eurocentric and other concerns, the AI that will really scare and excite me, is when it's clear that the AI is succeeding at 'generalised knowledge' without specific pre-training on the generalised test nature.

  4. The IQ test has been flawed for sometime, the ability to process data is not actual intelligence. The IQ test does not represent accurately how humans think in conjunction with two halves of a brain that have conflicting goals for understanding the world around them.

  5. “The police aimed their weapons at the fugitive”

    Bot: option 2! 98.4% certainty!

    Human: smh, you naïve robot, in this reality, option one is what actually results.

  6. Heavily disagree with using syntax comparisons for understanding. Especially as we obviously can derive meaning from those that do not speak English properly as well as the mutation of language .

  7. "5. Solve 2x^2+9x+7", I really hope there weren't too many questions like this, makes me think that the result might not be too trustworthy.

  8. Maybe soon we'll develop intuitive programming practices and cheap access to cloud computing – so that even low IQ humans can do high IQ tasks with the use of these obedient, selfless algorithms.

  9. Some serious context understanding necessary in this. I'm not sure I could actually beat all parts of this quiz all that well myself.

  10. Hello people, I am currently writing my bachelor thesis, more specifically Neuronal Network in relation to public administration. I want to show possible application examples for NN.

    I'm looking for cases where NN are allready in use or developement for public administration.

    So far, I have only heard of an optical sorting of writing in the Saarland.

    Please get in touch.
    Hallo leute, ich schreibe momentan meine Bachelorarbeit, genauer gesagt Neuronale Netzte im Bezug zur öffentlichen Verwaltung. Ich will möglcihe anwendungs Beispiele für NN zeigen.

    Dafür suche ich fälle in den heute schon NN in der verwaltung eingestetzt oder entwickelt werden.

    Bisher habe ich nur von einer Optischen sortirung von Schreiben im Saarland gehört.

    Meldet euch gern.

  11. IQ tests seem like BS when used on humans, except seeing a difference between the extremes of a person with brain damage and/or malformation compared to a 'regular' person and that still doesn't tell you much except that there's a difference in results. Is it something similar when testing 'AI' ?

  12. This is one of my favorite channels and it should definely participate on the #TeamTrees initiative!! 😀😀😀😀

  13. I never thought that a high school exam was something an AI could ever solve with only high school text books as training data.
    But I guess these AIs had a wider corpus to draw from.

  14. Oh yea, there is a new paper from Google that achives almost equivalent performance to humans on SuperGLUE. It's called Exploring the Limits of Transfer Learning with a

    Unified Text-to-Text Transformer
    (T5).

  15. Were the AI's trained on the test itself? If so then it's not really comparable to a human who only does the test once with the knowledge gained through life. The AI should be trained on somewhat similar dataset a human would learn in a preparation for such a test to be comparable.

  16. I failed English, and I have a question. The sentence "What fun to hear Artemis laugh. She's such a serious child. I didn't know she had a sense of humor."

    Isn't 'had' a word meaning previous owner ship? So this sentence means to me that Artemis lost the sense of humor she once possessed. Wouldn't a better word be 'has'?

  17. Is this first attempt results? Is it using specific (narrow), or general intelligence? Results of 78% verses human 62% corresponds to what in IQ? 126?

  18. makes me think of the rise of chess engines, i remember the days when i could beat a chess engine with ease, these days not even GM's stand a chance against these beasts .

  19. I can guarantee that when those cheeky engineers named it GLUE they knew absolutely well that it will become outdated and then they get a chance to name it Super GLUE. You can bet your papers on this!

  20. I still feel there's a piece of the jigsaw missing from AI's. And until we can figure out what it is, we'll never get AI's to be like humans. <— like how could an AI come up with the sentence I just wrote… and why would it?

  21. Not so much smart as they are good optimisers!
    That's all they really are.
    There needs to be another big leap if they are to become smart and intelligent!

  22. The sentient AI raised their weapon at the human rebel. What is the most likely outcome?

    1. The AI quell the rebellion.
    2. The rebellion defeats the AI.

    every AI simulation picks option 1.

  23. To be honest I’m still a fan of the Turing test as a general heuristic for intelligence? Humans can respond sensibly to a potentially infinite number of sentences (bc recursion, and also bc language is flexible and we use it creatively), and one can engage with nearly any field or topic through language. Because the set of potential problems/sentences to respond to is nearly infinite, any AI we deem generally intelligent should be around/exceeding? human performance in nearly every scenario. So, I’ve always thought it’s a good goal.

Leave a Reply

Your email address will not be published. Required fields are marked *