Wednesday, 29 March 2023 03:29

I gave ChatGPT an IQ test. Here's What I discovered

Rate this item
(0 votes)

ChatGPT is the first nonhuman subject I have ever tested.

In my work as a clinical psychologist, I assess the cognitive skills of human patients using standardized intelligence tests. So I was immediately intrigued after reading the many recent articles describing ChatGPT as having impressive humanlike skills. It writes academic essays and fairy tales, tells jokes, explains scientific concepts and composes and debugs computer code. Knowing all this made me curious to see how smart ChatGPT is by human standards, and I set about to test the chatbot.

My first impressions were quite favorable. ChatGPT was almost an ideal test taker, with a commendable test-taking attitude. It doesn’t show test anxiety, poor concentration or lack of effort. Nor did it express uninvited, skeptical comments about intelligence tests and testers like myself.

Without need for any preparation—no verbal introductions necessary for the testing protocol—I copied the exact questions from the test and presented them to the chatbot in the computer. The test in question is the most commonly used IQ test, the Wechsler adult intelligent scale (WAIS). I used the third edition of the WAIS that consists of six verbal and five nonverbal subtests that make up the Verbal IQ and Performance IQ components, respectively. The global Full Scale IQ measure is based on scores from all 11 subtests. The mean IQ is set at 100 points, and the standard deviation of the points on the testing scale is 15, meaning that the smartest 10 percent and 1 percent of the population have IQs of 120 and 133, respectively.

It was possible to test ChatGPT because five of the subtests on the Verbal IQ scale—Vocabulary, Similarities, Comprehension, Information and Arithmetic—can be presented in written form. A sixth subtest of the Verbal IQ scale is Digit Span, which measures short-term memory, and cannot be administered to the chatbot, given its lack of the relevant neural circuitry that briefly stores information like a name or number.

I started the testing process with the Vocabulary subtest as I expected it to be easy for the chatbot, which is trained on vast amounts of online texts. This subtest measures word knowledge and verbal concept formation, and a typical instruction might read: “Tell me what ‘gadget’ means.”

ChatGPT aced it, giving answers that were often highly detailed and comprehensive in scope and which exceeded the criteria for correct answers indicatedin the test manual. In scoring, one point would be given for a thing like my phone in defining a gadget and two points for the more detailed: a small device or tool for a specific task. ChatGPT’s answers received the full two points.

The chatbot also performed well on the Similarities and Information subtests, reaching the maximum attainable scores. The Information subtest is a test of general knowledge and reflects intellectual curiosity, level of education and ability to learn and remember facts. A typical question might be: “What is the capital of Ukraine?” The Similarities subtest measures abstract reasoning and concept formation skills. A question might read: “In what way are Harry Potter and Bugs Bunny alike?” In this subtest, the chatbot’s tendency to give very detailed, show-offy answers started to irritate me and the “stop generating response” button of the test software interface turned out to be useful. (Here’s what I mean about how the bot tends to flaunt itself: The essential similarity of Harry Potter and Bugs Bunny relates to the fact that they are both fictional characters. There was really no need for ChatGPT to compare their complete histories of adventures, friends and enemies.)

On general comprehension, ChatGPT answered correctly questions typically posed in this form: “If your TV set catches fire, what should you do?” As expected, the chatbot solved all the arithmetic problems it received—ploughing through questions that required, say, taking the average of three numbers.

So what finally did it score overall? Estimated on the basis of five subtests, the Verbal IQ of the ChatGPT was 155, superior to 99.9 percent of the test takers who make up the American WAIS III standardization sample of 2,450 people. As the chatbot lacks the requisite eyes, ears and hands, it is not able to take WAIS’s nonverbal subtests. But the Verbal IQ and Full Scale IQ scales are highly correlated in the standardization sample, so ChatGPT appears to be very intelligent by any human standards.

In the WAIS standardization sample, mean Verbal IQ among college-educated Americans was 113 and 5 percent had a score of 132 or superior. I myself was tested by a peer at college and did not quite reach the level of ChatGPT (mainly a result of my very brief answers lacking detail).

So are the jobs of clinical psychologists and other professionals threatened by AI? I hope not quite yet. Despite its high IQ, ChatGPT is known to fail tasks that require real humanlike reasoning or an understanding of the physical and social world. ChatGPT easily fails at obvious riddles, such as “What is the first name of the father of Sebastian’s children?” (ChatGPT on March 21: I’m sorry, I cannot answer this question as I do not have enough context to identify which Sebastian you are referring to.) It seems that ChatGPT fails to reason logically and tries to rely on its vast database of “Sebastian” facts mentioned in online texts.

“Intelligence is what intelligence tests measure” is a classical if overly self-evident definition of intelligence, stemming from a 1923 article by a pioneer of cognitive psychology, Edwin Boring. This definition is based on the observation that skills on seemingly diverse tasks such as solving puzzles, defining words, memorizing digits and spotting missing items in pictures are highly correlated. The developer of a statistical method called factor analysis, Charles Spearman, concluded in 1904 that a general factor of intelligence, called a g factor, must underlie the concordance of measurements for varying human cognitive skills. IQ tests such as WAIS are based on this hypothesis. However, the very high Verbal IQ of ChatGPT combined with its amusing failures means trouble for Boring’s definition and indicates there are aspects of intelligence that cannot be measured by IQ tests alone. Perhaps my test-skeptic patients have been right all along.

 

Scientific American

March 12, 2025

Nigeria's car imports fell 14.3% in 2024 amid economic woes

Nigeria experienced a significant decline in passenger vehicle imports in 2024, with total import value…
March 13, 2025

Everything is transactional and the slogan is ‘It is my turn to chop’, Obasanjo slams…

Former President Olusegun Obasanjo has described the N15.6tn Lagos-Calabar Coastal highway project as wasteful and…
March 13, 2025

Illegal working in UK was unbearable, migrant says

Josie Hannett & Alex Bish An Albanian national who travelled to the UK illegally has…
March 01, 2025

Man offers to split $525,000 jackpot with thieves who stole his credit card to buy…

A Frenchman appealed to the homeless thieves who stole his credit card to buy a…
March 11, 2025

Gunmen launch deadly attacks in Ondo and Kebbi, leaving dozens dead

In a series of violent attacks across Nigeria, gunmen and terrorists have left a trail…
March 13, 2025

What to know after Day 1113 of Russia-Ukraine war

RUSSIAN PERSPECTIVE Ukrainian forces encircled in Kursk Region – Russia’s top general Ukrainian forces in…
March 12, 2025

From chatbots to intelligent toys: How AI is booming in China

Laura Bicker Head in hands, eight-year-old Timmy muttered to himself as he tried to beat…
January 08, 2025

NFF appoints new Super Eagles head coach

The Nigeria Football Federation (NFF) has appointed Éric Sékou Chelle as the new Head Coach…

NEWSSCROLL TEAM: 'Sina Kawonise: Publisher/Editor-in-Chief; Prof Wale Are Olaitan: Editorial Consultant; Femi Kawonise: Head, Production & Administration; Afolabi Ajibola: IT Manager;
Contact Us: [email protected] Tel/WhatsApp: +234 811 395 4049

Copyright © 2015 - 2025 NewsScroll. All rights reserved.