Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

[ - ]

jabathekek@sopuli.xyz

190 points

1 day ago

permalink

report

reply

[ - ]

WhatAmLemmy@lemmy.world

74 points

23 hours ago

The results of this new GSM-Symbolic paper aren’t completely new in the world of AI research. Other recent papers have similarly suggested that LLMs don’t actually perform formal reasoning and instead mimic it with probabilistic pattern-matching of the closest similar data seen in their vast training sets.

WTF kind of reporting is this, though? None of this is recent or new at all, like in the slightest. I am shit at math, but have a high level understanding of statistical modeling concepts mostly as of a decade ago, and even I knew this. I recall a stats PHD describing models as “stochastic parrots”; nothing more than probabilistic mimicry. It was obviously no different the instant LLM’s came on the scene. If only tech journalists bothered to do a superficial amount of research, instead of being spoon fed spin from tech bros with a profit motive…

permalink

report

parent

reply

[ - ]

no banana@lemmy.world

39 points

23 hours ago

It’s written as if they literally expected AI to be self reasoning and not just a mirror of the bullshit that is put into it.

permalink

report

parent

reply

[ - ]

Sterile_Technique@lemmy.world

33 points

22 hours ago

Probably because that’s the common expectation due to calling it “AI”. We’re well past the point of putting the lid back on that can of worms, but we really should have saved that label for… y’know… intelligence, that’s artificial. People think we’ve made an early version of Halo’s Cortana or Star Trek’s Data, and not just a spellchecker on steroids.

The day we make actual AI is going to be a really confusing one for humanity.

permalink

report

parent

reply

Show more comments

[ - ]

jabathekek@sopuli.xyz

12 points

17 hours ago

describing models as “stochastic parrots”

That is SUCH a good description.

permalink

report

parent

reply

[ - ]

fluxion@lemmy.world

7 points

23 hours ago

Clearly this sort of reporting is not prevalent enough given how many people think we have actually come up with something new these last few years and aren’t just throwing shitloads of graphics cards and data at statistical models

permalink

report

parent

reply

[ - ]

aesthelete@lemmy.world

6 points

11 hours ago

If only tech journalists bothered to do a superficial amount of research, instead of being spoon fed spin from tech bros with a profit motive…

This is outrageous! I mean the pure gall of suggesting journalists should be something other than part of a human centipede!

permalink

report

parent

reply

[ - ]

seaQueue@lemmy.world

17 points

17 hours ago

permalink

report

parent

reply

[ - ]

jabathekek@sopuli.xyz

10 points

12 hours ago

*starts sweating

Look at that subtle pixel count, the tasteful colouring… oh my god, it’s even transparent…

permalink

report

parent

reply

[ - ]

The Snark Urge@lemmy.world

88 points

1 day ago

*

One time I exposed deep cracks in my calculator’s ability to write words with upside down numbers. I only ever managed to write BOOBS and hELLhOLE.

LLMs aren’t reasoning. They can do some stuff okay, but they aren’t thinking. Maybe if you had hundreds of them with unique training data all voting on proposals you could get something along the lines of a kind of recognition, but at that point you might as well just simulate cortical columns and try to do Jeff Hawkins’ idea.

permalink

report

reply

[ - ]

noodlejetski@lemm.ee

44 points

1 day ago

LLMs aren’t reasoning. They can do some stuff okay, but they aren’t thinking

and the more people realize it, the better. which is why it’s good that a research like that from a reputable company makes headlines.

permalink

report

parent

reply

[ - ]

Heliumfart@sh.itjust.works

2 points

16 hours ago

What about boobless?

permalink

report

parent

reply

[ - ]

The Snark Urge@lemmy.world

3 points

15 hours ago

permalink

report

parent

reply

[ - ]

anon_8675309@lemmy.world

75 points

15 hours ago

Did anyone believe they had the ability to reason?

permalink

report

reply

[ - ]

Kairos@lemmy.today

31 points

13 hours ago

Yes

permalink

report

parent

reply

[ - ]

Aeri@lemmy.world

15 points

8 hours ago

People are stupid OK? I’ve had people who think that it can in fact do math, “better than a calculator”

permalink

report

parent

reply

[ - ]

Semperverus@lemmy.world

-15 points

10 hours ago

I still believe they have the ability to reason to a very limited capacity. Everyone says that they’re just very sophisticated parrots, but there is something emergent going on. These AIs need to have a world-model inside of themselves to be able to parrot things as correctly as they currently do (yes, including the hallucinations and the incorrect answers). Sure they are using tokens instead of real dictionary words, which comes with things like the strawberry problem, but just because they are not nearly as sophisticated as us doesnt mean there is no reasoning happening.

We are not special.

permalink

report

parent

reply

[ - ]

galanthus@lemmy.world

9 points

9 hours ago

If the only thing you feed an AI is words, then how would it possibly understand what these words mean if it does not have access to the things the words are referring to?

If it does not know the meaning of words, then what can it do but find patterns in the ways they are used?

This is a shitpost.

We are special, I am in any case.

permalink

report

parent

reply

[ - ]

Semperverus@lemmy.world

-4 points

8 hours ago

*

It is akin to the relativity problem in physics. Where is the center of the universe? What “grid” do things move through? The answer is that everything moves relative to one another, and somehow that fact causes the phenomena in our universe (and in these language models) to emerge.

Likewise, our brains do a significantly more sophisticated but not entirely different version of this. There are more “cores” in our brains that are good at differen tasks that all constantly talk back and forth between eachother, and our frontal lobe provides the advanced thinking and networking on top of that. The LLMs are more equivalent to the broca’s area, they havent built out the full frontal lobe yet (or rather, the “Multiple Demand network”)

You are right in that an AI will never know what an apple tastes like, or what a breeze on its face feels like until we give them sensory equipment to read from.

In this case though, its the equivalent of a college student having no real world experience and only the knowledge from their books, lectures, and labs. You can still work with the concepts of and reason against things you have never touched if you are given enough information about them beforehand.

permalink

report

parent

reply

[ - ]

Excrubulent@slrpnk.net

5 points

4 hours ago

*

It’s an illusion. People think that because the language model puts words into sequences like we do, there must be something there. But we know for a fact that it is just word associations. It is fundamentally just predicting the most likely next word and generating it.

If it helps, we have something akin to an LLM inside our brain, and it does the same limited task. Our brains have distinct centres that do all sorts of recognition and generative tasks, including images, sounds and languge. We’ve made neural networks that do these tasks too, but the difference is that we have a unifying structure that we call “consciousness” that is able to grasp context, and is able to loopback the different centres into one another to achieve all sorts of varied results.

So we get our internal LLM to sequence words, one word after another, then we loop back those words via the language recognition centre into the context engine, so it can check if the words match the message it intended to create, it checks them against its internal model of the world. If there’s a mismatch, it might ask for different words till it sees the message it wanted to see. This can all be done very fast, and we’re barely aware of it. Or, if it’s feeling lazy today, it might just blurt out the first sentence that sprang to mind and it won’t make sense, and we might call that a brain fart.

Back in the 80s “automatic writing” took off, which was essentially people tapping into this internal LLM and just letting the words flow out without editing. It was nonesense, but it had this uncanny resemblance to human language, and people thought they were contacting ghosts, because obviously there has to be something there, right? But it’s not, it’s just that it sounds like people.

These LLMs only produce text forwards, they have no ability to create a sentence, then examine that sentence and see if it matches some internal model of the world. They have no capacity for context. That’s why any question involving A inside B trips them up, because that is fundamentally a question about context. "How many Ws in the sentence “Howard likes strawberries” is a question about context, that’s why they screw it up.

I don’t think you solve that without creating a real intelligence, because a context engine would necessarily be able to expand its own context arbitrarily. I think allowing an LLM to read its own words back and do some sort of check for fidelity might be one way to bootstrap a context engine into existence, because that check would require it to begin to build an internal model of the world. I suspect the processing power and insights required for that are beyond us for now.

permalink

report

parent

reply

[ - ]

trolololol@lemmy.world

3 points

10 hours ago

What’s the strawberry problem? Does it think it’s a berry? I wonder why

permalink

report

parent

reply

[ - ]

xthexder@l.sw0.com

10 points

10 hours ago

*

I think the strawberry problem is to ask it how many R’s are in strawberry. Current AI gets it wrong almost every time.

report

reply

[ - ]

7 points

10 hours ago

Ask an LLM how many Rs there are in strawberry

permalink

report

parent

reply

Show more comments

[ - ]

N0body@lemmy.dbzer0.com

50 points

15 hours ago

The tested LLMs fared much worse, though, when the Apple researchers modified the GSM-Symbolic benchmark by adding “seemingly relevant but ultimately inconsequential statements” to the questions

Good thing they’re being trained on random posts and comments on the internet, which are known for being succinct and accurate.

permalink

report

reply

[ - ]

blind3rdeye@lemm.ee

20 points

12 hours ago

Yeah, especially given that so many popular vegetables are members of the brassica genus

permalink

report

parent

reply

[ - ]

MoogleMaestro@lemmy.zip

5 points

10 hours ago

Absolutely. It would be a shame if AI didn’t know that the common maple tree is actually placed in the family cannabaceae.

permalink

report

parent

reply

[ - ]

emerald@lemmy.blahaj.zone

41 points

17 hours ago

statistical engine suggesting words that sound like they’d probably be correct is bad at reasoning

How can this be??

permalink

report

reply

[ - ]

Siegfried@lemmy.world

18 points

15 hours ago

I would say that if anything, LLMs are showing cracks in our way of reasoning.

permalink

report

parent

reply

[ - ]

MoogleMaestro@lemmy.zip

11 points

10 hours ago

Or the problem with tech billionaires selling “magic solutions” to problems that don’t actually exist. Or how people are too gullible in the modern internet to understand when they’re being sold snake oil in the form of “technological advancement” when it’s actually just repackaged plagiarized material.

permalink

report

parent

reply

[ - ]

feedum_sneedson@lemmy.world

1 point

2 hours ago

But what if they’re wearing an expensive leather jacket

permalink

report

parent