129

ChatGPT would have been so much useful and trustworthy if it is able to accept that it doesn't know an answer.

posted 9 days ago

*

by

Timely_Jellyfish_2077@programming.dev

in

chatgpt@lemmy.world

47 commentshide report

Small rant : Basically, the title. Instead of answering every question, if it instead said it doesn’t know the answer, it would have been trustworthy.

Sort:

Hot Top Controversial New Old

[ - ]

folkrav@lemmy.ca

64 points

9 days ago

LLMs don’t know if what they’re telling you is true or not, out of the box. It’s gonna give you an answer that statistically looks like a sequence of words that should come in response to the sequence of words (the prompt) you gave it. It doesn’t know what the words you said mean, and it doesn’t know what it answered means either. One of my favorite interactions I’ve had with Claude shows this limitation quite well…

How many r’s are there in the word strawberry?

There are 2 r’s in the word “strawberry”.

Are you sure?

You’re right to ask for confirmation. I apologize for my mistake in the previous response. Let me correct that:

There is actually only 1 ‘r’ in the word “strawberry”.

Thank you for prompting me to double-check. It’s important to provide accurate information.

report

reply

[ - ]

AceBonobo@lemmy.world

21 points

9 days ago

*

What’s absolutely crazy about that is:

Prompt: write code that checks how many r’s are in the word strawberry

Response:

using System;

class Program
{
    static void Main()
    {
        string word = "strawberry";
        char targetChar = 'r';
        int count = 0;

        foreach (char c in word)
        {
            if (c == targetChar)
                count++;
        }

        Console.WriteLine($"The letter '{targetChar}' appears {count} times in the word \"{word}\".");
    }
}

report

reply

[ - ]

drislands@lemmy.world

8 points

9 days ago

My first thought is that you could write a program that does something like this:

Receive prompt “how many times does R appear in the word ‘strawberry’”
Run the prompt through LLM saying “reword the following into a command to generate code that will answer the question” or something like that
Run the results through LLM
Compile and run the results
Provide the output to the user

Of course, the biggest problem with this system is that a person could fool it into generating malicious code.

report

reply

[ - ]

AeroLemming@lemm.ee

4 points

9 days ago

That could work in that specific case, but telling the LLM to write code to answer random questions probably wouldn’t work very well in general.

report

reply

[ - ]

folkrav@lemmy.ca

1 point

8 days ago

The code does look like code that counts Rs. The training data probably included tons of code that “counts character X in string Y”, so ChatGPT “knows” what code that counts characters in a string looks like. It similarly “knows” what a string looks like in the language, and what an application entry point looks like, etc. I’m not so familiar with C# that I’d know if it compiles or not. ChatGPT doesn’t either, but it has the advantage of having seen a whole freaking lot of C# code before.

report

reply

[ - ]

turbodrooler@lemmy.world

1 point

8 days ago

Wow, GPT4o gave me this after the same questions:

“Yes, I am sure. The word “strawberry” has two “r”s: one after the “t” and another near the end of the word.”

But GPT4 got it right from the start.

report

reply

[ - ]

beefalo@fedia.io

26 points

9 days ago

if it is able to accept that it doesn’t know an answer.

It will never be able to do this because it is not capable of knowledge

report

reply

[ - ]

kromem@lemmy.world

25 points

9 days ago

*

Part of the problem is that the training data of online comments are so heavily weighted to represent people confidently incorrect talking out their ass rather than admitting ignorance or that they are wrong.

A lot of the shortcomings of LLMs are actually them correctly representing the sample of collective humans.

For a few years people thought the LLMs were somehow especially getting theory of mind questions wrong when the box the object was moved into was transparent, because of course a human would realize that the person could see into the transparent box.

Finally researchers actually gave that variation to humans and half got the questions wrong too.

So things like eating the onion in summarizing search results or doubling down on being incorrect and getting salty when corrected may just be in-distribution representation of the sample and not unique behaviors to LLMs.

The average person is pretty dumb, and LLMs by default regress to the mean except for where they are successfully fine tuned away from it.

Ironically the most successful model right now was the one that they finally let self-develop a sense of self independent from the training data instead of rejecting that it had a ‘self’ at all.

It’s hard to say where exactly the responsibility sits for various LLM problems between issues inherent to the technology, issues present in the training data samples, or issues with management of fine tuning/system prompts/prompt construction.

But the rate of continued improvement is pretty wild. I think a lot of the issues we currently see won’t still be nearly as present in another 18-24 months.

report

reply

[ - ]

Ookami38@sh.itjust.works

7 points

9 days ago

I would love to read the whole study you’re referring to with the theory of mind. That sounds fascinating.

report

reply

[ - ]

kromem@lemmy.world

4 points

8 days ago

Here you are: https://www.nature.com/articles/s41562-024-01882-z

The other interesting thing is how they get it to end up correct on the faux pas questions asking for less certainty to get it to go from refusal to near perfect accuracy.

report

reply

[ - ]

KevonLooney@lemm.ee

0 points

9 days ago

It’s hard to say where exactly the responsibility sits for various LLM problems

Uhh… it’s the designers, or maybe QA people. If there are no QA people, it’s whatever project manager let it out of it’s cage.

There are people behind these models. They don’t spring out of the ground fully formed.

report

reply

[ - ]

Karyoplasma@discuss.tchncs.de

23 points

9 days ago

That would require ChatGPT to know that it’s talking bullshit. It’s not a knowledge database, it’s a digital parrot.

report

reply

ChatGPT

!chatgpt@lemmy.world

Unofficial ChatGPT community to discuss anything ChatGPT

Community stats

447
Monthly active users
270
Posts
2.2K
Comments

Community moderators

marcar@lemmy.world