Discussion about this post

User's avatar
Odd anon's avatar

> But question two already has a red flag: it re-starts the numbering from 1:

Unless I'm very much mistaken, ChatGPT did not actually print that number 1. It looks like it's supposed to be an ordered-list element, which normally automatically increments regularly, only here the count was reset because its messages are separated.

As has already been pointed out in the subreddit post linking here, this is all about outdated tech. ChatGPT 3.5 (from early 2022) fails this test, while version 4.0 (from early 2023) completes it.

People have tried to find consistent "gaps" in GPT-4's ability, and largely failed. The best effort at this is the GAIA benchmark, developed by a team of researchers from Meta-FAIR, Meta-GenAI, HuggingFace, and AutoGPT, specifically to be easy for humans and difficult for LLMs. They "succeeded": GPT-4 averaged a score of only 15%, and while their human teams scored higher. Only one problem: I highly doubt even one in a thousand humans could score that high. Certainly I can't.

This is where we are now: So long as there's any one skill that at least one human can outperform the LLM at, we say it doesn't really understand. Do we treat anything else that way? If you tried to play twenty questions with a random stranger and, surprise surprise, they said things like this, would you say they don't really understand anything?

Or, let's put it differently: If some freak happenstance caused a chicken to start communicating like GPT-4, conveying its ability to understand at the same level which GPT-4 does, would you be comfortable eating it? Is this really where you draw the line?

Expand full comment
ID's avatar

i disagree that this proves much of anything. It’s in the same vein but opposite side as people who ask an LLM to make a limerick about transistors, then point to it and say “How can you deny this has intelligence?!”

If the LLM has any intelligence (I think it does!), then it’s an alien intelligence. and you shouldn’t expect your intuitions to match. “Intelligence” does not fit on one-axis even for humans. This means that they suck at some things that they should be good at. They can’t even play tic tac toe reliably without prompting!

One way that LLMs are unintuitive is that they can’t take time to reason — notice the that they respond at the same rate no matter the complexity of the question.

Maybe try searching for their sparks of intelligence instead of proposing yet another turing test? We have thousands of those at this point.

Expand full comment
6 more comments...

No posts