lemm.ee

Local All Communities Log in Sign up

Local All Communities

240

Why AI detectors think the US Constitution was written by AI(arstechnica.com)

posted 1 year ago

by

jocanib@lemmy.world

in

technology@lemmy.world

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ +- ]

jocanib@lemmy.worldOP

13 points

1 year ago

It will almost always be detectable if you just read what is written. Especially for academic work. It doesn’t know what a citation is, only what one looks like and where they appear. It can’t summarise a paper accurately. It’s easy to force laughably bad output by just asking the right sort of question.

The simplest approach for setting homework is to give them the LLM output and get them to check it for errors and omissions. LLMs can’t critique their own work and students probably learn more from chasing down errors than filling a blank sheet of paper for the sake of it.

report

reply

[ +- ]

sadreality@kbin.social

1 point

1 year ago

Chad comment right here…

report

reply

[ +- ]

54 points

1 year ago

given how much AI has advanced in the past year alone, saying it will “always” be easy to spot is extremely short sighted.

report

reply

[ +- ]

Terrasque@infosec.pub

1 point

1 year ago

Some things are inherent in the way the current LLM’s work. It doesn’t reason, it doesn’t understand, it just predicts the next word out of likely candidates based on the previous words. It can’t look ahead to know if it’s got an answer, and it can’t backtrack to change previous words if it later finds out it’s written itself into a corner. It won’t even know it’s written itself into a corner, it will just continue predicting in the pattern it’s seen, even if it makes little or no sense for a human.

It just mimics the source data it’s been trained on, following the patterns it’s learned there. At no point does it have any sort of understanding of what it’s saying. In some ways it’s similar to this, where a man learned how enough french words were written to win the national scrabble competition, without any clue what the words actually mean.

And until we get a new approach to LLM’s, we can only improve it by adding more training data and more layers allowing it to pick out more subtle patterns in larger amounts of data. But with the current approach, you can’t guarantee that what it writes will be correct, or even make sense.

report

reply

[ +- ]

nulldev@lemmy.vepta.org

5 points

1 year ago

it just predicts the next word out of likely candidates based on the previous words

An entity that can consistently predict the next word of any conversation, book, news article with extremely high accuracy is quite literally a god because it can effectively predict the future. So it is not surprising to me that GPT’s performance is not consistent.

It won’t even know it’s written itself into a corner

It many cases it does. For example, if GPT gives you a wrong answer, you can often just send an empty message (single space) and GPT will say something like: “Looks like my previous answer was incorrect, let me try again: blah blah blah”.

And until we get a new approach to LLM’s, we can only improve it by adding more training data and more layers allowing it to pick out more subtle patterns in larger amounts of data.

This says nothing. You are effectively saying: “Until we can find a new approach, we can only expand on the existing approach” which is obvious.

But new approaches come all the time! Advances in tokenization come all the time. Every week there is a new paper with a new model architecture. We are not stuck in some sort of hole.

report

reply

[ +- ]

Terrasque@infosec.pub

-3 points

1 year ago

An entity that can consistently predict the next word of any conversation, book, news article with extremely high accuracy is quite literally a god because it can effectively predict the future

I think you’re reading something there other than what I said. Look, today’s LLM’s ingest a ton of text - more accurately tokens - and builds up statistics of which tokens it sees in that context. So statistically if you see the sentence "A nice cup of " statistically the next word is maybe 48% coffee, 28% tea, 17% water and so on. If earlier in the text it says something about heating a cup of oil, that will have a muuch higher chance. It then picks one of the top tokens at (weighted) random, and then the text (array of tokens) is fed in again into the LLM and a new prediction is made. And so on it continues until you stop the loop (usually from a end token or a keyword you’re looking for). Larger LLM’s are better at spotting more subtle patterns - or more accurate it got more layers of statistics that’s applied - but it still has the fundamental issue of going one token at a time and just going by what’s most likely to be the next token.

It many cases it does. For example, if GPT gives you a wrong answer, you can often just send an empty message (single space) and GPT will say something like: “Looks like my previous answer was incorrect, let me try again: blah blah blah”.

Have you tried that when it’s correct too? And in that case you mention it has a clean break and then start anew with token generation, allowing it to go a different path. You can see it more clearly experimenting with local LLM’s that have fewer layers to maintain the illusion.

This says nothing. You are effectively saying: “Until we can find a new approach, we can only expand on the existing approach” which is obvious.

But new approaches come all the time! Advances in tokenization come all the time. Every week there is a new paper with a new model architecture. We are not stuck in some sort of hole.

We’re trying to make a flying machine by improving pogo sticks. No matter how well you design the pogo stick and the spring, it will not be a flying machine.

report

reply

[ +- ]

nulldev@lemmy.vepta.org

5 points

1 year ago

*

The issue here is that you are describing the goal of LLMs, not how they actually work. The goal of an LLM is to pick the next most likely token. However, it cannot achieve this via rudimentary statistics alone because the model simply does not have enough parameters to memorize which token is more likely to go next in all cases. So yes, the model “builds up statistics of which tokens it sees in which contexts” but it does so by building it’s own internal data structures and organization systems which are complete black boxes.

Also, going “one token at a time” is only a “limitation” because LLMs are not accurate enough. If LLMs were more accurate, then generating “one token at a time” would not be an issue because the LLM would never need to backtrack.

And this limitation only exists because there isn’t much research into LLMs backtracking yet! For example, you could give LLMs a “backspace” token: https://news.ycombinator.com/item?id=36425375

Have you tried that when it’s correct too? And in that case you mention it has a clean break and then start anew with token generation, allowing it to go a different path. You can see it more clearly experimenting with local LLM’s that have fewer layers to maintain the illusion.

If it’s correct, then it gives a variety of responses. The space token effectively just makes it reflect on the conversation.

We’re trying to make a flying machine by improving pogo sticks. No matter how well you design the pogo stick and the spring, it will not be a flying machine.

To be clear, I do not believe LLMs are the future. But I do believe that they show us that AI research is on the right track.

Building a pogo stick is essential to building a flying machine. By building a pogo stick, you learn so much about physics. Over time, you replace the spring with some gunpowder to get a mortar. You shape the gunpowder into a tube to get a model rocket and discover the pendulum rocket fallacy. And finally, instead of gunpowder, you use liquid fuel and you get a rocket that can go into space.

report

reply

[ +- ]

Terrasque@infosec.pub

-4 points

1 year ago

The issue here is that you are describing the goal of LLMs, not how they actually work.

No, I am describing how they actually work.

it cannot achieve this via rudimentary statistics alone because the model simply does not have enough parameters to memorize which token is more likely to go next in all cases.

True, hence the limitations. That would require infinite storage and infinite compute capability.

Also, going “one token at a time” is only a “limitation” because LLMs are not accurate enough.

No, it’s done because one letter at a time is too slow. Tokens are a “happy” medium tradeoff.

The space token effectively just makes it reflect on the conversation.

It makes a “break” of the block, which lets it start a new answer instead of continuing on the previous. How it reacts to that depends on the fine tune and filters before the data hits the LLM.

To be clear, I do not believe LLMs are the future.

I have just said that LLM’s we have today can’t fix the problems with false data and hallucinations, because it’s a core principle of how it operates. It will require a new approach.

You could add a rocket engine and wings to a pogo stick, but then it’s no longer a pogo stick but an airplane with a weird landing gear. Today’s LLM’s could give us hints to how to make a better AI, but that would be a different thing than today’s LLM’s. From what has been leaked from OpenAI GPT4 has scaling issues so they use mixture of experts. Just throwing hardware at it is already showing diminishing returns. And we’re learning fascinating new ways of training them, but the inherent problem is the same.

For example, if you ask an LLM if it can give an answer to a question, it will have two paths to go down, positive and negative. Note, at the point where it chooses that it doesn’t know how to finish it, it doesn’t look ahead. But it sees for example that 80% of the answers in the texts it’s been trained on starts with a positive, then it will most likely start with “yes” - and when it does that it will continue to generate an answer - often very convincing and plausibly real looking answer, because it already committed to that path.

And as for the link about teaching it backspace token, the comments there are already pointing out the issue:

It’s interesting that in the examples (Table 3 on page 21), the model uses the backspace token to erase the randomly-added token from the prompt, but it does not seem to ever use the token to correct its own output. I’m curious how frequently the model actually uses this backspace token in practice - and if the answer is “vanishingly rarely”, what is the source of the improved Mauve score and sample diversity they show? Is it just that the different training procedure gives an improvement?

For it to use the backspace, wouldn’t it have to predict the wrong token with greater confidence than the corrected token? I would think this would require more examples of a wrong token + correction than the correct token, which seems a bit odd.

Almost none of the text it’s trained on has a backspace token, and to finetune it in is tricky since it’s a completely new concept - and remember it’s still doing token for token - so it would have to write a token and then right after find out that it’s more likely to send a backspace token than to continue it. It’s interesting, and LLM’s can pick up on some crazy patterns, but I’m skeptical.

report

reply

Show more comments

Show more comments

Show more comments

Show more comments

Show more comments

[ +- ]

Kara@kbin.social

25 points

1 year ago

People seem to grasp onto weaknesses AI has now and say that they will have them forever, like how text AI lies, and image generation AI can’t draw hands.

But these AIs are advancing unimaginably quick, 2 years ago generated text was pretty bad, becoming pretty incoherent, and 1 year ago generated images were mostly strange mush.

report

reply

[ +- ]

aebrer@kbin.social

3 points

1 year ago

Spot on! Actually people still talk about hands but it’s already been solved with many newer image gen models… The hands they produce look perfectly fine usually these days.

report

reply

[ +- ]

Zeth0s@lemmy.world

24 points

1 year ago

This is not entirely correct, in my experience. With the current version pf gtp-4 you might be right, but the initial versions were extremely good. Clearly you have to work with it, you cannot ask for the whole work

report

reply

[ +- ]

jocanib@lemmy.worldOP

4 points

1 year ago

That’s not true! There’s heaps of early-GPT articles pointing out how much bullshit it regurgitates (eg Why does ChatGPT constantly lie?). And no evidence at all that the breathless fanboys have even stopped to check.

report

reply

[ +- ]

Zeth0s@lemmy.world

7 points

1 year ago

*

I meant initial versions of chatGTP 4. ChatGTP isn’t lying, simply because lying implies a malevolent intent. Gtp-4 has no intent, it just provides an output given an input, that can be either wrong or correct. A model able to provide more correct answers is a more accurate model. Computing accuracy for a LLM is not trivial, but gpt-4 is still a good model. User has to know how to use it, what to expect and how to evaluate the result. If they are unable to do so it’s completely their fault.

Why are you so pissed of a good nlp model?

report

reply

[ +- ]

Anomander@kbin.social

2 points

1 year ago

I’m no GPT booster, but I think that the real problem with detectability here

It will almost always be detectable if you just read what is written. Especially for academic work.

is that it requires you to know the subject and content already, and to be giving the paper a relatively detailed reading. For a rube reading the paper, trying to learn from it - a lot of GPT content is easily mistaken as legitimate. And it’s getting better. We’re not safe simply assuming that AI today is as good as it will ever get and the clear errors we can detect cannot ever be addressed.

Penetrating academic writing, for academics, is probably one of the highest barriers of any writing task, AI or not.

But being dismissive of the threat of AI content because it’s not able to convincingly fake some of the hardest writing that real people do is maybe sidestepping a lot of much more casual writing - that still carries significance and consequence.

report

reply

[ +- ]

Asifall@lemmy.world

8 points

1 year ago

I think there’s a big difference between being able to identify an AI by talking to it and being able to identify something written by an AI, especially if a human has looked over it for obvious errors.

report

reply

[ +- ]

Tyler_Zoro@ttrpg.network

3 points

1 year ago

What you are describing is true of older LLMs. GPT4, it’s less true of. GPT5 or whatever it is they are training now will likely begin to shed these issues.

The shocking thing that we discovered that lead to all of this is that this sort of LLM continues to scale in capabilities with the quality and size of the training set. AI researchers were convinced that this was not possible until GPT proved that it was.

So the idea that you can look at the limitations of the current generation of LLM and make blanket statements about the limitations of all future generations is demonstrably flawed.

report

reply

[ +- ]

jocanib@lemmy.worldOP

2 points

1 year ago

They cannot be anything other than stochastic parrots because that is all the technology allows them to be. They are not intelligent, they don’t understand the question you ask or the answer they give you, they don’t know what truth is let alone how to determine it. They’re just good at producing answers that sound like a human might have written them. They’re a parlour trick. Hi-tech magic 8balls.

report

reply

[ +- ]

Tyler_Zoro@ttrpg.network

4 points

1 year ago

They cannot be anything other than stochastic parrots because that is all the technology allows them to be.

Are you referring to humans or AI? I’m not sure you’re wrong about humans…

report

reply

[ +- ]

jocanib@lemmy.worldOP

-4 points

1 year ago

Sam Altman is a know-nothing grifter. HTH

report

reply

[ +- ]

tate@lemmy.sdf.org

-1 points

1 year ago

*

That article is super helpful.

Thanks!

report

reply

[ +- ]

nulldev@lemmy.vepta.org

4 points

1 year ago

Have you even read the article?

IMO it does not do a good job of disproving that “humans are stochastic parrots”.

The example with the octopus isn’t really about stochastic parrots. It’s more about how LLMs are not multi-modal.

report

reply

Show more comments

Show more comments

[ +- ]

nulldev@lemmy.vepta.org

8 points

1 year ago

LLMs can’t critique their own work

In many cases they can. This is commonly used to improve their performance: https://arxiv.org/abs/2303.11366

report

reply

[ +- ]

jocanib@lemmy.worldOP

-1 points

1 year ago

*accurately

report

reply

[ +- ]

nulldev@lemmy.vepta.org

5 points

1 year ago

Whoops, meant to say: “In many cases, they can accurately (critique their own work)”. Thanks for correcting me!

report

reply

Technology

!technology@lemmy.world

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

@L4s@lemmy.world
@autotldr@lemmings.world
@PipedLinkBot@feddit.rocks
@wikibot@lemmy.world

Community stats

18K
Monthly active users
12K
Posts
541K
Comments

Community moderators

L3s@lemmy.world
enu@lemm.ee
fry@fry.gs
L3s@fry.gs
enu@lemmy.world
L4sBot@fry.gsB
L4sBot@lemmy.worldB

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.5 (github)