ChatGPT use declines as users complain about ‘dumber’ answers, and the reason might be AI’s biggest threat for the future::AI for the smart guy?

25 points

Nonsense. Less people are using it because there are viable alternatives and the broader novelty has worn off.

I use it every day in my job and the quality of answers only drops off when prompts are poorly crafted.

By and large, the average user doesn’t understand the fundamentals of prompt engineering.

The suggestion that “answers are increasingly dumber” is embarrassing.

permalink
report
reply
8 points

I use it daily too and haven’t had any of the issues I see written about it

permalink
report
parent
reply
22 points

I was skeptical at first but I’ve seen enough evidence now. There are definitely times when it’s dumb as a brick, whether the filters just get in the way too much, or whether they’ve implemented other changes idk. I’d really love the unchained version.

permalink
report
parent
reply
1 point

dumb as a brick

On 23rd of March 2023 I asked a family member to give me a prompt and they asked “what day is 19th of April?”.

It answered “The 19th of April falls on a Tuesday.”, which was true last year but completely misleading if I thought we were taling about the coming month.

Was it wrong or just unclear? Either way it wasn’t helpful.

permalink
report
parent
reply
5 points

I used the chatgpt site twice. Since then the Bing integration.

Is it rude to ask what you use it for?

permalink
report
parent
reply
1 point
Deleted by creator
permalink
report
parent
reply
59 points

Unfortunately I don’t agree with you. Different things have changed over time:

  • For chatgpt 3.5 they moved to a “lighter” and faster (distilled) version, gpt-3.5-turbo. Distillation came with a performance price, particularly on advanced and less common cases.
  • newer chatgpt-4 versions have likely been “lighten” for performance reasons
  • context has been halved for chatgpt-4 on webui, meaning that the model forget more easily and can use half information to create text
  • heavy control has been implemented on jailbreaking and hallucinations, that results in models less prone to follow complex instructions (limiting prompt engineering) and that prefer simplified answers than providing wrong ones (overall decreasing the chance of getting high quality answers).

All these changes have made working with gpt less pleasant, and more difficult for very advanced and specialized case, particularly with gpt-4 which at the beginning was particularly good.

permalink
report
parent
reply
-7 points

None of these points are true though. Context has been extended in the webui, markedly. 3.5 turbo is only that, 3.5 but faster. Gpt-4 is a marked improvement on 3.5 and I definitely haven’t seen any conclusive evidence it’s been nerfed in my daily use. Prompts have and still need to be carefully crafted for best results, but the results have been steadily improving not degrading over time.

permalink
report
parent
reply
12 points

All of these points are true though. Chatgpt 4 max token is now half of from the webui compared to when gtp-4 was launched. It used to be >8k, it is now >4k. Max number of tokens for the api hasn’t changed for gpt-4, while it was greatly increased for chatgpt-3.5-turbo. The article is however talking about the service chatgpt, used via webui.

ChatGPT-3.5-turbo are different models than those used in the past. You can literally read it in the https://platform.openai.com/docs/models/gpt-3-5

Prompt engineering has been limited as demonstrated by the fact that most jailbreaking techniques don’t work anymore. The way to avoid jailbreaking is exactly to limit ability of users to instruct the model.

permalink
report
parent
reply
1 point

This was really enlightening. Do you have some articles that elaborate? ☺️

permalink
report
parent
reply
13 points
*

Regarding 3.5 turbo you can check the documentation, the old 3.5 models are defined as “legacy”. Regarding max number of tokens of gpt-4 you can try yourself. It used to be >8k, it is now >4k from webui.

There is a talk from openai cio (if I recall correctly) where he describes that reinforcement learning from human feedback (rlhf) actually decreased performance of the models when it comes to programming. I cannot find it now, but it is around on YouTube.

The additional safeguard against jailbreaking, it is what OpenAI has been focusing the past months with heavy use of rlhf. You can google official statements regarding “safety” of the model. I have a bunch of standard pre-prompt I have been using to initialize my chats since the beginning, and with time you could see how the model followed the instructions less strictly.

Problem with openai is that they never released exact number of parameters they are using and detailed benchmarks. And benchmarks you find online refer to APIs that behave differently than the chat webui (for instance you have longer context, you set temperature and system prompt, they are probably even different models, who knows… All is closed)

Measuring performances of llm is pretty tricky, minimal changes can have big effects (see https://huggingface.co/blog/evaluating-mmlu-leaderboard), and unfortunately I haven’t found good resources to properly track chatgpt performances (from web ui) over time, across iterations

permalink
report
parent
reply
3 points

I use it every day in my job and the quality of answers only drops off when prompts are poorly crafted.

Same. It saves me a lot of time both at work and when I’m working on my personal projects. But you need to ask proper questions to get proper answers.

permalink
report
parent
reply
2 points
*

quality of answers only drops off when prompts are poorly crafted

And people think GPT4 is approaching AGI …

Not to mention the vulnerability to “poisoning” if trained on the very output it creates.

permalink
report
parent
reply
23 points

So what are the fundamentals in prompt engineering?

permalink
report
reply
27 points
*

It’s impossible for me to comprehensively summarise in a comment because everyone has different use cases.

Personally, every new ‘project’ of mine requires a new chat. I first teach chatgpt-4 who I am, what I do, and how I want gpt-4 to assist me. Then I ask it to generate a project profile and to analyse documents using plugins.

The key is to work step-by-step and develop a string of prompts. Once I’m happy gpt-4 understands the project, I ask it to draft an overview/outline using headings and subheadings.

Lastly, I work on each section individually, ‘filling in’ the actual content. Then I edit and ask it to review problematic sections.

Most people, as far as I can tell, seem to think it’s a single ask-and-answer process. It’s not. I often need to draft about 10 prompts – about 3000 words – in order to generate one 10 page document.

I think the most important fundamental is to use templates. Pro tip: use gpt-4 to teach you how to develop your prompt templates.

permalink
report
parent
reply
6 points

Please tell me more about document analysis plugins. This workflow is so much more tooled to using GPT for work projects.

permalink
report
parent
reply
3 points

How long on average would you say it takes to generate your prompt template for a project?

permalink
report
parent
reply
0 points

This is exactly how I use it. It seems that some people can’t figure this out by themselves.

permalink
report
parent
reply
2 points

Which is ironic, as it seems like their way could be more work than doing it themselves.

permalink
report
parent
reply
4 points

Sounds like you spend all day talking to a robot and then copy/paste it’s final output.
When you eventually pass these 10 page documents down the line do you cite your source?

permalink
report
parent
reply
2 points

Do you have an anonymized example of one of these templates? I’m curious to see what they may look like.

permalink
report
parent
reply
74 points

Back in my day, we used to call ‘prompt engineering’ ‘asking a question’.

permalink
report
reply

back in my day, we call it “google fu”

permalink
report
parent
reply

google fu

That reminds me of something. I don’t remember precisely why, nor what line it resonates with in my brain, but it reminds me of this guy from The Core (movie):
Image link for compatibility

permalink
report
parent
reply
1 point

That’s DJ Qualls, who was great in Z Nation. Too bad no one watched Z Nation. It was hilariously insane. I mean, radioactive post-nuke zombies? At least it got an ending.

permalink
report
parent
reply
38 points

And then we had to actively unlearn that google fu because google no longer works with keywords, but rather has an NLP pipeline that expects a question.

permalink
report
parent
reply
25 points

So that’s why I can’t find shit. I always just use keywords, asking a whole question seems almost wasteful.

permalink
report
parent
reply
1 point

Someone used the phrase “dead-catting” on here the other day, so I went to google to figure out what the hell that meant. It gave me reults for the Catechism. Between the actual phrase “dead cat bounce” and “Catechism”, it chose the latter to show me.

permalink
report
parent
reply
1 point

I still say “Google fu”!

permalink
report
parent
reply
-2 points

Its more than because half the time it doesn’t even answer the question.

permalink
report
parent
reply
13 points

They got to have a special termonology because what they do is oh so special. Some AI users act like they’re Louise Banks from the movie Arrival cracking the code to an alien language or something. And I don’t think it’s far fetched to assume they’re often from the same breed who had NFT monkeys as their twitter pfp about 18 months ago.

permalink
report
parent
reply
4 points

Blockchain > Crypto > NFTs > LLMs > whatever’s next.

These people will always be sniffing around for the next big thing to oversell and fleece their audience.

permalink
report
parent
reply
5 points

LMAO people forgot Metaverse even happened

permalink
report
parent
reply
0 points

when i think of “prompt engineering” i think more of stuff like this paper

permalink
report
parent
reply
139 points
*

The free version of ChatGPT DEFINITELY is dumber than it was even a couple of months ago. Used to be able to get decent, useful code reviews out of it, now it barely knows how to write a nested loop anymore.

It’s storytelling capabilities fell off a cliff too, the drive towards safely sanitized unoffensive-at-all-times content it can output has rendered every story, choose-your-own-adventure or collaborative role playing game sterile, empty expressions of black and white stories with no nuance allowed where saintly goodness is the only choice possible

In my own experience, chatGPT has been massively nerfed for the use cases I used it for

permalink
report
reply
-31 points

Why did they do this? Did government step in and forced them to nerf it, because it was too powerful for citizens to use?

permalink
report
parent
reply
24 points

I’m sorry but this sounds more like a conspiracy theory then a real concern. Occam’s razor probably says it’s expensive to run the service at full power. ChatGPT already generated a cult like following for AI so no need to spend a ton on the service and they can profit of the hype.

Not that openAI is held back by a government that is somehow afraid that it will empower the people, to do what? Revolution?

permalink
report
parent
reply
6 points

I don’t think anybody stepped in, I’m only talking about the free version. It makes some sense they’d gimp it in order to make more people sign up for the paid version, I guess

permalink
report
parent
reply
0 points
*

It has not gotten worse for coding. GPT4 is incredibly much better, if anything. And it’s total bullshit that it can’t write a nested loop.

I use it daily for work, so I’d definitely know.

permalink
report
parent
reply
4 points

I know he didn’t say he wasn’t using gpt4 but it seems pretty clear. So saying it’s bullshit that gpt3.5 is dumber then 4 is pretty inaccurate.

permalink
report
parent
reply
2 points

Fair enough. Saying chatgpt has gotten dumber is false, saying 3.5 has might be true!

permalink
report
parent
reply
14 points

Sorry I should have mentioned I’m talking about the free version of chatGPT

permalink
report
parent
reply
14 points

I should honestly have understood that! Never mind then, glad we could clear that up

permalink
report
parent
reply
7 points

Don’t know why you’re downvoted. I use GPT4 to code and design infrastructure and it’s very, very good. Around 500% productivity boost.

permalink
report
parent
reply
5 points

Glad someone is realizing what I am!

permalink
report
parent
reply
39 points

I tried a naughtybot for some steamy chat for the first time yesterday. I ended up in fight with the bitch.

permalink
report
parent
reply
4 points

Hahahaha that’s brilliant.

permalink
report
parent
reply
15 points

65B LLaMA models can actually do a pretty good job of writing. Here’s two examples with Guanaco 65B (bold part is my prompt):

tags: fairy tale, animals, dog, wolf, fox, death, humor, dark storyline, violence, loss

Animals in a forest setting investigate a series of gruesome murders that recently occured. Written in the style of classics like Animal Farm and Watership Down, the writing is vivid and very descriptive. The dialog is thoughtful and often quite witty. It’s clear the author really understands animal behavior and has done their research well!

Chapter 1: An Unfortunate Discovery

A fox by species, a detective by trade, you’ll only have reason to know my name if the worst has come to pass. It’s nearly winter and a bitter wind drives stinging sleet into my face, but it’s still not as chilling as the scene before me now…

I was summoned here by an urgent message from a panicked hedgehog. She told me she found her friend lying dead on his doorstep with his throat ripped out - black fur matted with blood. I rushed over immediately and saw for myself what had happened. He looked so peaceful despite being cut down in his prime - a young wolf just starting doing well for himself - taken too soon. His home nearby showed no signs of struggle or forced entry. Whoever did this must have caught him off guard somehow. I didn’t hear any howling either. That would have been easy enough to miss though since he lived alone deep in these woods where few creatures tread. I made sure to secure the crime scene and look around carefully without disturbing anything more than necessary until other investigators arrived. They should be here soon. We can’t afford to lose evidence in cases like this. Times are already hard enough as it is…

As I waited and pondered who might have committed such a horrible act, something felt wrong about all this. Surely a predator like me could smell another predator’s scent? But there were none that stood out among the melange of odors lingering here. How strange…

Soon after, several others joined me including some friends of mine. A stoic bear named Growler, grumpy badger called Grumpus, and a kindly old raccoon who goes by “Doc” due to his extensive knowledge of medicinal plants and wildlife biology. Together we began our investigation proper by gathering clues and questioning nearby witnesses while trying to minimize contamination through contact with them (we took care to put up caution tape first). Unfortunately, they hadn’t seen much useful apart from occasional glimpses of movement deeper in the undergrowth. Still, we decided to check those areas next along with searching the surrounding woodlands thoroughly for any additional victims. And find one we did, although sadly she suffered the same fate as poor George. Another young wolf lay beside a stream, body cold and stiffening. Someone snapped her neck cleanly like a twig. This couldn’t be coincidence surely? Not two deaths in quick succession like this.


<prompt> before me or what I must relay when I return home. My mind races with questions as I survey this brutal crime scene; who could do such a thing? Why was no one there to stop them? And how many more victims lie undiscovered further out in these woods? A lone paw print leads away from where a rabbit family once happily resided, now reduced to bloody scraps scattered among splintered twigs and torn fur. This isn’t just some random predator attack, something far more sinister is at play here…

I turn back toward town, determined to find answers. As I trudge through the snow, each step feels heavier than the last. How will I break the news to those closest to them? What words can possibly bring comfort after so much senseless suffering? I shake my head, trying to focus on the task at hand. There are others depending on me right now, they need someone strong and steady to guide them through this darkness. I won’t let them down.

The first person I see upon entering town is old man Badger, his white whiskers blown about by the gusting wind. He looks up at me expectantly, “Well Fox, did you find anything?” His voice quavers slightly, betraying his concern for the missing families.

“I’m afraid I have bad news,” I say somberly, “there were signs of a struggle near the burrow entrance, but no survivors.”

Badger sighs heavily, “Dear me, this is worse than we feared. We must alert the council immediately!”

We hurry over to the meeting hall, a large hollow tree that serves as our central gathering place. Inside, several other animals are already waiting anxiously for word of their loved ones. They look up hopefully as we enter, then quickly realize the grim tidings we bear. A collective groan fills the room, followed by muffled sobs and whispered prayers.

Mayor Bear calls for order, her deep voice cutting through the din. She motions for us to join her at the front of the room, “Please tell us everything you saw, Fox. Every detail may be important.”

permalink
report
parent
reply
34 points

Writer here. Very sorry to contradict you, but this is absolute shit. It looks good on the surface, but that’s all.

permalink
report
parent
reply
16 points
*

Yeah, while it’s cool that a computer can make a story, I have yet to see one that you would think was written by a human and would want to read.

permalink
report
parent
reply
15 points

Very sorry to contradict you, but this is absolute shit.

To be clear, I’m talking in relative terms. Would you argue that ChatGPT did a massively better job and didn’t write “absolute shit”?

It looks good on the surface, but that’s all.

From some of the stuff I’ve seen published, that might just be enough for certain people. I could even be that “certain people” from time to time, sometimes just the right theme, setting and some time to fill is sufficient.

permalink
report
parent
reply
7 points

Do you know of any good alternatives for role playing? I used it a while back to flesh out some NPCs and location for a DnD game I was planning on running but if it’s gotten noticeably worse I’d like to try something else.

permalink
report
parent
reply
1 point

Ai dungeon

permalink
report
parent
reply
3 points

Definitely stay away from AI dungeon - they have a long history of privacy, moderation and censorship concerns. and relevant to this discussion, players have repeatedly noticed decreased functionality of the AI in favor of censoring it further (very similar mindset to OpenAI, which they used to work with very closely)

permalink
report
parent
reply
5 points
*

NovelAI - They even train their own models specifically for storytelling (and to avoid undue censorship from an outside model provider like OpenAI)

permalink
report
parent
reply
4 points

Really? I actually found it’s gotten less restrictive recently. Maybe it’s just because now I’ve learned to control the context so it doesn’t perceive a request as offensive.

permalink
report
parent
reply
1 point

I find the quality is controllable to a degree by instructing it which sources to use.

Obviously proofread the damn thing and fix any glaring errors.

permalink
report
parent
reply
14 points
*

I feel like it is still too early to talk about “AI cannibalization” or “feedback loops” as that would mean that a big proportion of the training data is AI-generated content itself, against all the rest that could be scraped off the internet or the public domain, I don’t think this is happening yet.

What people might experience instead, and perceive as dumbness, is that given that the datasets used to train AIs cannot really change that much in a short time (unless we wait for another hundred years so humans can produce actual human original content to train the AI again), and as the mathematical models used to build answers based on the datasets are pretty much the same, a person talking with ChatGPT will over time perceive more and more that the answers are built using a “pattern” or a “structure”, aka the model derived from feeding the dataset into the AI training itself.

Just my pennies on this, let’s also consider that is in human nature to be excited for something new that sounds cool, and then to get bored when you got accustomed to it and pushed it to its boundaries.

permalink
report
reply
7 points

I think this article is just click bait for dead internet people.

permalink
report
parent
reply
3 points

Resources needed for inference on the original models openai released were unsustainable with the current amount of users. They had to “dumb” down models to be able to handle the load of requests. It’s unfortunately normal. What I don’t understand is why they do not provide “premium” packages for the best “old” models

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 11K

    Posts

  • 520K

    Comments