Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds. Researchers found wild fluctuations—called drift—in the technology’s abi…::ChatGPT went from answering a simple math correctly 98% of the time to just 2%, over the course of a few months.

7 points

GPT was always really bad at math.

I’ve asked it word problems before and it fails miserably, giving me insane answers that make no sense. For example, I was curious once how many stars you would expect to find in a region of the milky way with a radius of 650 light years, assuming an average of 4 light years per star. The first answer it gave me was like a trillion stars or something, and I asked it if that makes sense to it, a trillion stars in a subset of space known to only contain about a quarter of that number, and it gave me a wildly different answer. I asked it to check again and it gave me a third wildly different number.

Sometimes it doubles down on wrong answers.

GPT is amazing but it’s got a long way to go.

permalink
report
reply
6 points

I used GPT4 the other day and it worked perfectly for calculating formulas of straight lines on linear-log plots but maybe I was the 2%

permalink
report
reply
9 points
*
Deleted by creator
permalink
report
reply
30 points
*

This paper is pretty unbelievable to me in the literal sense. From a quick glance:

First of all they couldn’t even bother to check for simple spelling mistakes. Second, all they’re doing is asking whether a number is prime or not and then extrapolating the results to be representative of solving math problems.

But most importantly I don’t believe for a second that the same model with a few adjustments over a 3 month period would completely flip performance on any representative task. I suspect there’s something seriously wrong with how they collect/evaluate the answers.

And finally, according to their own results, GPT3.5 did significantly better at the second evaluation. So this title is a blatant misrepresentation.

Also the study isn’t peer-reviewed.

permalink
report
reply
4 points

“AI” taking our jobs and all that huh

permalink
report
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 11K

    Posts

  • 518K

    Comments