46 points

I think this article does a good job of asking the question “what are we really measuring when we talk about LLM accuracy?” If you judge an LLM by its: hallucinations, ability analyze images, ability to critically analyze text, etc. you’re going to see low scores for all LLMs.

The only metric an LLM should excel at is “did it generate human readable and contextually relevant text?” I think we’ve all forgotten the humble origins of “AI” chat bots. They often struggled to generate anything more than a few sentences of relevant text. They often made syntactical errors. Modern LLMs solved these issues quite well. They can produce long form content which is coherent and syntactically error free.

However the content makes no guarantees to be accurate or critically meaningful. Whilst it is often critically meaningful, it is certainly capable of half-assed answers that dodge difficult questions. LLMs are approaching 95% “accuracy” if you think of them as good human text fakers. They are pretty impressive at that. But people keep expecting them to do their math homework, analyze contracts, and generate perfectly valid content. They just aren’t even built to do that. We work really hard just to keep them from hallucinating as much as they do.

I think the desperation to see these things essentially become indistinguishable from humans is causing us to lose sight of the real progress that’s been made. We’re probably going to hit a wall with this method. But this breakthrough has made AI a viable technology for a lot of jobs. So it’s definitely a breakthrough. I just think either I finitely larger models (of which we can’t seem to generate the data for) or new models will be required to leap to the next level.

permalink
report
reply
17 points
*

But people keep expecting them to do their math homework, analyze contracts, and generate perfectly valid content

People expect that because that’s how they are marketed. The problem is that there’s an uncontrolled hype going on with AI these days. To the point of a financial bubble, with companies investing a lot of time and money now, based on the promise that AI will save them time and money in the future. AI has become a cult. The author of the article does a good job in setting the right expectations.

permalink
report
parent
reply
5 points

I just told an LLM that 1+1=5 and from that moment on, nothing convinced it that it was wrong.

permalink
report
parent
reply
3 points
*

I just told chat gpt(4) that 1 plus 1 was 5 and it called me a liar

permalink
report
parent
reply
18 points

pretty hype for third ai winter tbh

permalink
report
reply
18 points

It’s not going to happen. The previous AI winters happened because hardware just wasn’t there to do the math necessary.

Right now we absolutely have the hardware. Besides, AI is more than just ChatGPT. Object recognition and image/video analytics have been big business for about a decade now, and still growing strong. The technology is proven, matured, and well established at this point.

And there are other well established segments of AI that are, for the most part, boring to the average person. Stuff like dataset analytics, processing large amounts of data (scientific data, financial stuff, etc).

LLMs may have reached a point of diminishing returns (though I sincerely doubt it) but LLMs are a fraction of the whole AI industry. And transformer models are not the only kind of known model and there’s nonstop research happening at break neck speed.

There will never be another AI winter. The hype train will slow down eventually, but never another winter.

permalink
report
parent
reply
4 points

I think it depends on how you define AI winter. To me, the hype dying down is quite a winter. Hype dying -> less interest in AI in general. But will development stop? Of course not, the same as the previous AI winter, AI researchers didn’t stop. But there are a decreasing number of them eventually.

permalink
report
parent
reply
2 points

But that’s not how the industry defines AI winter. You’re thinking of hype in the context of public perception, but that’s not what matters.

Previous AI interest was about huge investments into research with the hope of a return on that investment. But since it didn’t pan out, the interest (from investors) dried up and progress drastically slowed down.

GPUs are what made the difference. Finally AI research could produce meaningful results and that’s where we’re at now.

Previously AI research could not exist without external financial support. Today AI is fully self-sustaining, meaning companies using AI are making a profit while also directing some of that money back into research and development.

And we’re not talking chump change, we’re talking hundreds of billions. Nvidia has effectively pivoted from a gaming hardware company to the number one AI accelerator manufacturer in the world.

There’s also a number of companies that have started developing and making analogue AI accelerators. In many cases the so the same workload for a fraction of the energy costs of a digital one (like the H100).

There’s so much happening every day and it keeps getting faster and faster. It is NOT slowing down anytime soon, and at this point it will never stop.

permalink
report
parent
reply
2 points

I think increasingly specialized models and analog systems that run them will be increasingly prevalent.

LLMs at their current scales don’t do enough to be worth their enormous cost… And adding more data is increasingly difficult.

That said: the gains on LLMs have always been linear based on recent research. Emergence was always illusory.

permalink
report
parent
reply
1 point

I’d like to read the research you alluded to. What research specifically did you have in mind?

permalink
report
parent
reply
2 points

Also the compounding feedback loop. AI is helping chips get gabbed faster, designs better and faster etc

permalink
report
parent
reply
1 point

What we haven’t hit yet is the point of diminishing returns for model efficiency. Small, locally run models are still progressing rapidly, which means we’re going to see improvements for the everyday person instead of just for corporations with huge GPU clusters.

That in turn allows more scientists with lower budgets to experiment on LLMs, increasing the chances of the next major innovation.

permalink
report
parent
reply
1 point

Exactly. We’re still very early days with this stuff.

The next few years will be wild.

permalink
report
parent
reply
10 points

If we really have changed regimes, from rapid progress to diminishing returns, and hallucinations and stupid errors do linger, LLMs may never be ready for prime time.

…aaaaaaaaand the AI cult just canceled Gary Marcus.

permalink
report
reply
9 points

I mean, LLMs already are prime time. They’re capable of tons of stuff. Even if they don’t gain a single new ability from here onward they’re still revolutionary and their impact is only just becoming apparent. They don’t need to become AGI.

So speculating that they “may never be ready for prime time” is just dumb. Perhaps he’s focused on just one specific application.

permalink
report
parent
reply
8 points
*

In truth, we are still a long way from machines that can genuinely understand human language. […]

Indeed, we may already be running into scaling limits in deep learning, perhaps already approaching a point of diminishing returns. In the last several months, research from DeepMind and elsewhere on models even larger than GPT-3 have shown that scaling starts to falter on some measures, such as toxicity, truthfulness, reasoning, and common sense.

I’ve rarely seen anyone so committed to being a broken clock in the hope of being right at least once a day.

Of course, given he built a career on claiming a different path was needed to get where we are today, including a failed startup in that direction, it’s a bit like the Upton Sinclair quote about not expecting someone to understand a thing their paycheck depends on them not understanding.

But I’d be wary of giving Gary Marcus much consideration.

Generally as a futurist if you bungle a prediction so badly that four days after you were talking about diminishing returns in reasoning a product comes out exceeding even ambitious expectations for reasoning capabilities in an n+1 product, you’d go back to the drawing board to figure out where your thinking went wrong and how to correct it in the future.

Not Gary though. He just doubled down on being a broken record. Surely if we didn’t hit diminishing returns then, we’ll hit them eventually, right? Just keep chugging along until one day those predictions are right…

permalink
report
reply

ChatGPT

!chatgpt@lemmy.world

Create post

Unofficial ChatGPT community to discuss anything ChatGPT

Community stats

  • 237

    Monthly active users

  • 296

    Posts

  • 2.3K

    Comments

Community moderators