Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds. Researchers found wild fluctuations—called drift—in the technology’s abi...(fortune.com)

posted 1 year ago

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds. Researchers found wild fluctuations—called drift—in the technology’s abi…::ChatGPT went from answering a simple math correctly 98% of the time to just 2%, over the course of a few months.

Sort:

Hot Top Controversial New Old

[ - ]

CaptainAniki@lemmy.flight-crew.org

66 points

1 year ago

At the start I used to use ChatGPT to help me write really rote and boring code but now it’s not even useful for that. Half the stuff it sends me (very basic functions) LOOK correct but don’t return the correct values or the parameters are completely wrong or something absolutely critical.

permalink

report

[ - ]

Boinketh@lemm.ee

22 points

1 year ago

Deleted by creator

permalink

report

parent

[ - ]

aquinteros@lemmy.world

12 points

1 year ago

idk what you guys mean but GitHub copilot still works absolutely well, the suggestions are fast and precise, with little Tweeks here and there… and gpt4 with code interpreter are absolute game changers … idk about basic chatgpt 3.5 turbo though

permalink

report

parent

[ - ]

danwardvs@sh.itjust.works

8 points

1 year ago

Github Copilot is a bit different, it’s powered by OpenAI Codex which is trained on all public repos. And yes, it’s quite effective!

report

[ - ]

6 points

1 year ago

Deleted by creator

permalink

report

parent

Show more comments

[ - ]

blue_zephyr@lemmy.world

30 points

1 year ago

This paper is pretty unbelievable to me in the literal sense. From a quick glance:

First of all they couldn’t even bother to check for simple spelling mistakes. Second, all they’re doing is asking whether a number is prime or not and then extrapolating the results to be representative of solving math problems.

But most importantly I don’t believe for a second that the same model with a few adjustments over a 3 month period would completely flip performance on any representative task. I suspect there’s something seriously wrong with how they collect/evaluate the answers.

And finally, according to their own results, GPT3.5 did significantly better at the second evaluation. So this title is a blatant misrepresentation.

Also the study isn’t peer-reviewed.

permalink

report

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds. Researchers found wild fluctuations—called drift—in the technology’s abi...(fortune.com)

Technology

!technology@lemmy.world

Our Rules

Approved Bots

Community stats

Community moderators