ChatGPT broke the Turing test — the race is on for new ways to assess AI : technology

[–] [email protected] 114 points 11 months ago (5 children)

The fundamental flaw of the Turing test is that it requires a human. Apparently, making a human believe they are talking to a human is much easier than previously thought.

[–] [email protected] 60 points 11 months ago (2 children)

Much easier, in fact; Eliza could pass the Turing test in 1966. Humans are incredibly eager to assess other things as being human or human-like.

[–] [email protected] 17 points 11 months ago

Go on.

And what makes you think that?

Mhm. Tell me more.

"Human or human-like". Can you tell me more about that?

How do you feel about it?

[–] [email protected] 15 points 11 months ago* (last edited 11 months ago)

The real Turing test requires an expert doing the test, not just some random easily impressed person.

The ELIZA-style bots work very well on the later kind, as the bot is just repeating your own text back at you with some grammatical remixing, e.g. you say "I am afraid of horses", bot says "Why do you say you are afraid of horses?". You can have very long conversation with yourself that way, as the bot contributes nothing to the discussion. It just provides enough plausible English to keep you talking. Meanwhile when you have an expert (or really just any person with a little bit of a clue) test ELIZA, the bot falls completely apart within just three lines of dialog. The bot is incredible basic and really can't do anything by itself, it completely depends on the user to provide all the content of the conversation.

[–] [email protected] 50 points 11 months ago (2 children)

You can take a sharpie and draw a sad face on a rock and then you'll feel sad for it. We're gullable.

[–] [email protected] 57 points 11 months ago (1 children)

But why is the rock sad :(

[–] [email protected] 31 points 11 months ago

I know.. I get sad just thinking about the sad rock :(

load more comments (1 replies)

[–] [email protected] 12 points 11 months ago

Slap some 2D anime girl avatar on it and you got yourself a top grossing v-tuber.

[–] [email protected] 11 points 11 months ago* (last edited 11 months ago) (1 children)

A test that didn't require a human could theoretically be tested automatically by the machine preemptively and solved easily.

I can't imagine how would you test this in a way that wouldn't require a human.

[–] [email protected] 4 points 11 months ago (3 children)

Let two AI's talk to each other and see if they find out that they both aren't humans?

[–] [email protected] 6 points 11 months ago* (last edited 11 months ago)

Bro, humans literally don't have that capability (that's the presumption here). Or are you saying that many of us don't have better consciousness than AIs? I might agree with that!

load more comments (2 replies)

[–] [email protected] 6 points 11 months ago

Why is it a flaw? What do you think the Turing Test is?

[–] [email protected] 59 points 11 months ago (48 children)

That's because it was built to beat the Turing test. The test was flawed. Chatgpt is just a Chinese room

[–] [email protected] 20 points 11 months ago (3 children)

What is a Chinese room?

[–] [email protected] 72 points 11 months ago

Imagine that you're locked in a room. You don't know any Chinese, but you have a huge instruction book written in English that tells you exactly how to respond to Chinese writing. Someone outside the room slides you a piece of paper with Chinese writing on it. You can't understand it, but you can look up the characters in your book and follow the instructions to write a response.

You slide your response back out to the person waiting outside. From their perspective, it seems like you understand Chinese because you're providing accurate responses, but actually, you don't understand a word. You're just following instructions in the book.

[–] [email protected] 39 points 11 months ago* (last edited 11 months ago)

Its a thought experiment involving a room where people write letters and shove them under the door of the Chinese kid's dorm room. He doesn't understand what's in the letters so he just forwards the mail randomly to his Russian and Indian neighbours who sometimes react angrily or happily depending on the content. Over time the Chinese kid learns which symbols make the Russian happy and which symbols make the Indian kid happy, and so forwards the mail correspondingly until he starts dating and gets a girlfriend that tells him that people really shouldn't be shoving mail under his door, and he shouldn't be forwarding mail he doesnt understand for free.

[–] [email protected] 22 points 11 months ago (2 children)

https://en.wikipedia.org/wiki/Chinese_room

[–] [email protected] 14 points 11 months ago (1 children)

Wow, solid wiki article! It's very hard to say anything on the subject that hasn't been said.

I didn't see the simple phrasing:

"What if the human brain is a Chinese Room?"

but that seems to fall under eliminative materialism replies.

Part of the Chinese Room program (both in our heads and in an AI) could be dedicated to creating the experience of consciousness.

Searle has no substantial logical reply to this criticism. He openly takes it on faith that humans have consciousness, which is funny because an AI could say the same thing.

[–] [email protected] 5 points 11 months ago* (last edited 11 months ago) (6 children)

The whole point of the Chinese room is that it doesn't need anything "dedicated to creating the experience of consciousness". It can pass the Turing test perfectly well without such a component. Therefore passing the Turing test - or any similar test based solely on algorithmic output - is not the same as possessing consciousness.

load more comments (6 replies)

[–] [email protected] 4 points 11 months ago* (last edited 11 months ago) (4 children)

en.wikipedia.org/wiki/Chinese_room

Man, I love coming across terms like this.

Chinese Room, Chinese Walls, Dutch Treat, Dutch Uncle, Dutch Oven.

load more comments (4 replies)

[–] [email protected] 17 points 11 months ago (4 children)

The Chinese room argument makes no sense to me. I cant see how its different from how young children understand and learn language.

My 2 year old sometimes unmistakable start counting when playing. (Countdown for lift off) Most numbers are gibberish but often he says a real number in the midst of it. He clearly is just copying and does not understand what counting is. At some point though he will not only count correctly but he will also be able to answer math questions. At what point does he “understand” at what point would you consider that chatgpt “understands” There was this old tv programm where some then ai experts discussed the chinese room but they used a chinese restaurant for a more realistic setting. This ended with “So if i walk into a chinese restaurant, pick sm out on the chinese menu and can answer anything the waiter may ask, in chinese. Do i know or understand chinese? I remember the parties agreeing to disagree at that point.

[–] [email protected] 9 points 11 months ago* (last edited 11 months ago)

Yes... the chinese experiment misses the point, because the Turing test was never really about figuring out whether or not an algorithm has "conscience" (what is that even?)... but about determining if an algorithm can exhibit inteligent behavior that's equivalent/indistinguishable from a human.

The chinese room is useless because the only thing it proves is that people don't know what conscience is, or what are they even are trying to test.

[–] [email protected] 6 points 11 months ago (9 children)

ChatGPT will never understand. LLMs have no capacity to do so.

To understand you need underlying models of real world truth to build your word salad on top of. LLMs have none of that.

[–] [email protected] 5 points 11 months ago (7 children)

What are your underlying models of the world built out of? Because I'm human, and mine are primarily built out of words.

How do you draw a line between knowing and understanding? Does a dog understand the commands it's been trained to obey?

[–] [email protected] 5 points 11 months ago

Your underlying model is not made out of words, but out of concepts. You can have multiple words that all map to the same concept, i.e. cosmos, universe, space. Or a single word that map to different concepts.

load more comments (6 replies)

load more comments (8 replies)

[–] [email protected] 4 points 11 months ago* (last edited 11 months ago) (4 children)

For one thing, understanding implies that a word is linked to a mental concept. So if you say "The car is red", you first need to mentally compare the mental concept of "red" to the car in question.

The Chinese room bypasses all of that, it can say "The car is red" without ever having seen a red object at all.

load more comments (4 replies)

load more comments (1 replies)

[–] [email protected] 11 points 11 months ago* (last edited 11 months ago) (1 children)

My gripe with the Chinese room is that Searle argues that his inability to understand Chinese means the program doesn't understand Chinese, but I could say the same thing about the human body.

The neurons that operate your vocal chords have no idea what they're saying, nor the ones in your hands any idea what they're writing, yet they can speak and write exactly because your brain tells them what to do. Your brain is exactly like that book as far as your mouth and hand neurons are concerned.

They don't need to understand language at all for your brain to be able to understand it and give instructions based on that understanding.

My only argument is at what point does an algorithm become sufficiently advanced that it is indistinguishable from a conscious being?

Because at the end of the day, most of what a brain does is information processing based on what it has previously learnt, and that's exactly what the algorithm is doing based on training data. A sufficient enough algorithm should surely be able to replicate understanding.

Sure, that isn't ChatGPT as we know it, as you can tell from its sometimes very zany responses that while it understands what words are valid responses, it doesn't understand what the words themselves mean, but we should reach that at some point, no?

[–] [email protected] 8 points 11 months ago (4 children)

Keep in mind ChatGPT is a language model. It's designed specifically to simulate sounding like a human. It does that... Okay. It doesn't understand the information or concepts it is using. It just sounds like it does. It can't reliably do basic maths and doesn't try or need to. It just needs to talk about it in a believably conversational way.

The brain does far more than process information. And ChatGPT doesn't even really do that.

load more comments (4 replies)

[–] [email protected] 9 points 11 months ago* (last edited 11 months ago)

Well mostly the flaw is people assigning the test abilities it was never intended. Like testing intelligence. Turing outright as first thing in the paper presenting "imitation game" noted moving away from testing intelligence, since he didn't know to do that. Even on the realm of "testing intelligent kind of behavior" well more like human like behavior and human being here proxy for intelligent, it was mostly an academic research idea. Not a concrete test meant to be some milestone.

If the meaning of the words ‘machine’ and ‘think’ are to be found by examining how they are commonly useit is difficult to escape the conclusion that the meaning and the answer to the question, ‘Can machines think?’ is to be sought in a statistical survey such as a Gallup poll. But this is absurd. Instead of attempting such a definition I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words.

Turing wanted a way to step away from stuff like "thinking" and "intelligence" directly and then proposed "imitation game" mostly to the rest of the academia as way to develop computer systemics more towards "intelligent behavior". It was mostly like "hey we need some goal to have as a goal to have something to move towards with these intelligence things. This isn't intelligence, but it might be usefull goal or tool for development work". Since without some goal/project/aim to have project don't advance. So it was "how about we try to develop a thing, that can beat this imitation game. Wouldn't that be good stepping stone. Then we can move to the actual serious stuff. Just an idea".

However since this academic "thinking out aloud spitballing ideas" was uttered by the Alan Turing, it became the Turing Test and everyone started taking it way too seriously. Specially outside academia. Who yes did play the imitation game with their programs as it was intended as research and development tool.

exemplified by for example this little exerpt of "not trying to do anything too complete and ground breaking here":

In any case there is no intention to investigate here the theory of the game, and it will be assumed that the best strategy is to try to provide answers that would naturally be given by a man

It is pretty literally "I had a thought". Turin makes no claims of machine beating the game having any significance other than "machine beat this game I came up with, neat". There is no argument of if machine beats imitation game, then X or then it means Y is reached.

Rest of the paper is actually about objections to the core idea of "it could ever be possible for machine to think" and even as such said imitation game is kinda lead in or introduction to Turing's treatise various objections of various "it would be impossible for machine to think" arguments. Starting with theological argument of "only human soul can think. Hence no animal or machine can think." .... since it was 1950's.

[–] [email protected] 6 points 11 months ago* (last edited 11 months ago)

I don’t understand how Chinese room is a valuable argument. To me, while the person inside the room doesn’t understand Chinese, the system room-person-instructions does. You don’t argue that you don’t understand your language because none of your individual neurons understand it.

I don’t claim that chatGPT “understands” the language, I just don’t think that this argument applies in general.

load more comments (43 replies)

[–] [email protected] 57 points 11 months ago* (last edited 11 months ago) (2 children)

Title:

ChatGPT broke the Turing test

Content:

Other researchers agree that GPT-4 and other LLMs would probably now pass the popular conception of the Turing test. [...]

researchers [...] reported that more than 1.5 million people had played their online game based on the Turing test. Players were assigned to chat for two minutes, either to another player or to an LLM-powered bot that the researchers had prompted to behave like a person. The players correctly identified bots just 60% of the time

Complete contradiction. Trash Nature, it's become only an extremely expensive gossip science magazine.

PS: The Turing test involves comparing a bot with a human (not knowing which is which). So if more and more bots pass the test, this can be the result either of an increase in the bots' Artificial Intelligence, or of an increase in humans' Natural Stupidity.

[–] [email protected] 13 points 11 months ago (1 children)

So if more and more bots pass the test, this can be the result either of an increase in the bots’ Artificial Intelligence, or of an increase in humans’ Natural Stupidity.

Or it "simply" plays with human biases, which are very natural. Stuff like seeing faces in everything that somewhat resembles two eyes and a mouth (or sometimes just the eyes and a head like shape etc.) is pretty hard wired. We have similar biases in regards to language. If something reads like it was written by a human, we immediately sympathize with it. Which is also the reason these LLMs are so successful and cause so many people to fear our AI overlords are right around the corner. Simply because the language is good we go into "damn, that's like a human"-mode.

[–] [email protected] 7 points 11 months ago

Agree (you made me think of the famous face on Mars). I mean that more as a joke. Also there's no clear threshold or divide on one side of which we can speak of "human intelligence". There's a whole range from impairing disabilities to Einstein and Euler – if it really makes sense to use a linear 1D scale, which very probably doesn't.

[–] [email protected] 8 points 11 months ago* (last edited 11 months ago)

Also, the Turing Test isn't some holy grail of AI. It's just a thought experiment, and not even the highest test for an AI that we can think of. Passing it is impressive don't get me wrong, but unlike what clickbait articles would tell you, it does not automatically mean an AI is sentient or is smarter than humans or anything like that. It means it passed the thought experiment, nothing more.

Also also, ChatGPT was not the first AI to pass the Turing Test. Actually, plenty have, even over a decade before.

[–] [email protected] 47 points 11 months ago (5 children)

There is the capitalist alternative to the Turing test: Have ChatGPT get a job. Hook it up to the Web, let it find itself a work-from-home job and go to work. Can it make as much money as a human, can it make enough money to pay for its own survival? Will it get fired?

[–] [email protected] 23 points 11 months ago* (last edited 11 months ago)

That just sounds like a recipe for breeding robot sociopaths. It'll find its way into management and doom us all.

[–] [email protected] 18 points 11 months ago

Will it get promoted, start managing people, start investing, start its own companies, and quickly take over the world?

load more comments (3 replies)

[–] [email protected] 19 points 11 months ago* (last edited 11 months ago) (1 children)

Funny I don't see much talk in this thread about Francois Chollet's abstraction and reasoning corpus, which is emphasised in the article. It's a really neat take on how to understand the ability of thought.

A couple things that stick out to me about gpt4 and the like are the lack of understanding in the realms that require multimodal interpretations, the inability to break down word and letter relationships due to tokenization, lack of true emotional ability, and similarity to the "leap before you look" aspect of our own subconscious ability to pull words out of our own ass. Imagine if you could only say the first thing that comes to mind without ever thinking or correcting before letting the words out.

I'm curious about what things will look like after solving those first couple problems, but there's even more to figure out after that.

Going by recent work I enjoy from Earl K. Miller, we seem to have oscillatory cycles of thought which are directed by wavelengths in a higher dimensional representational space. This might explain how we predict and react, as well as hold a thought to bridge certain concepts together.

I wonder if this aspect could be properly reconstructed in a model, or from functions built around concepts like the "tree of thought" paper.

It's really interesting comparing organic and artificial methods and abilities to process or create information.

load more comments (1 replies)

[–] [email protected] 16 points 11 months ago (1 children)

Please let's not start measuring AI success by how successfully capitalist they can be. I'm not exactly an anti-capitalist, but I think that could only end in tears.

load more comments (1 replies)

[–] [email protected] 14 points 11 months ago

Ironically chatGPT also fails the Turing test by being so competent that no human could match that.

[–] [email protected] 14 points 11 months ago

Honestly, though, I even can't decide whether other people have consciousness. Cogito ergo sum, if you know what I'm talking about.

[–] [email protected] 11 points 11 months ago (4 children)

What about the Voight-Kampff test? What would it do if it sees a turtle in the dessert?

load more comments (4 replies)

[–] [email protected] 5 points 11 months ago (2 children)

How does ChatGPT do with the Winograd schema? That's a lot harder to fake: https://en.m.wikipedia.org/wiki/Winograd_schema_challenge

load more comments (2 replies)

Technology