OpenAI just admitted it can’t identify AI-generated text. That’s bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

135 points

Text written before 2023 is going be exceptionally valuable because that way we can be reasonably sure it wasn’t contaminated by an LLM.

This reminds me of some research institutions pulling up sunken ships so that they can harvest the steel and use it to build sensitive instruments. You see, before the nuclear tests there was hardly any radiation anywhere. However, after America and the Soviet Union started nuking stuff like there’s no tomorrow, pretty much all steel on Earth has been a little bit contaminated. Not a big issue for normal people, but scientists building super sensitive equipment certainly notice the difference between pre-nuclear and post-nuclear steel

permalink
report
reply
47 points

The background radiation did go up, but saying “there was hardly any radiation anywhere” is wrong. Today’s steel (and background radiation) is pretty much back to pre-nuke levels. Low-background steel Background radiation

permalink
report
parent
reply
26 points

It is also worth nothing that we can make low or no radiation-contaminated steel, it’s just really expensive and hard and happens in very low quantities.

permalink
report
parent
reply
1 point

We could even make isotropically pure iron, jeah.

permalink
report
parent
reply
6 points

Not really. If it’s truly impossible to tell the text apart, than it doesn’t really pose a problem for training AI. Otherwise, next-gen AI will be able to tell apart text generated by current gen AI, and it will get filtered out. So only the most recent data will have unfiltered shitty AI-generated stuff, but they don’t train AI on super-recent text anyway.

permalink
report
parent
reply
28 points

This is not the case. Model collapse is a studied phenomenon for LLMs and leads to deteriorating quality when models are trained on the data that comes from themselves. It might not be an issue if there were thousands of models out there but there are only 3-5 base models that all the others are derivatives of IIRC.

permalink
report
parent
reply
1 point

People still tap into real world while AI does not do that yet. Once AI will be able to actively learn from realworld sensors, the problem might disappear, no?

permalink
report
parent
reply
1 point
*

I don’t see how that affects my point.

  • Today’s AI detector can’t tell apart the output of today’s LLM.
  • Future AI detector WILL be able to tell apart the output of today’s LLM.
  • Of course, future AI detector won’t be able to tell apart the output of future LLM.

So at any point in time, only recent text could be “contaminated”. The claim that “all text after 2023 is forever contaminated” just isn’t true. Researchers would simply have to be a bit more careful including it.

permalink
report
parent
reply
54 points

The wording of every single article has such an anti AI slant, and I feel the propaganda really working this past half year. Still nobody cares about advertising companies, but LLMs are the devil.

Existing datasets still exist. The bigger focus is in crossing modalities and refining content.

Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?

I swear, most of the people with heavy opinions don’t even know half of how the machines work or what they are doing.

permalink
report
reply
63 points

Probably because LLMs threaten to (and has already started to) shittify a truly incredible number of things like journalism, customer service, books, scriptwriting etc all in the name of increased profits for a tiny few.

permalink
report
parent
reply
55 points

again, the issue isn’t the technology, but the system that forces every technological development into functioning “in the name of increased profits for a tiny few.”

that has been an issue for the fifty years prior to LLMs, and will continue to be the main issue after.

removing LLMs or other AI will not fix the issue. why is it constantly framed as if it would?

we should be demanding the system adjust for the productivity increases we’ve already seen, as well to what we expect in the near future. the system should make every advancement a boon for the general populace, not the obscenely wealthy few.

even the fears of propaganda. the wealthy can already afford to manipulate public discourse beyond the general public’s ability to keep up. the bigger issue is in plain sight, but is still being largely ignored for the slant that “AI is the problem.”

permalink
report
parent
reply
22 points

Yep, the problem was never LLMs, but billionaires and the rich. The problems have always been the rich for thousands of years, and yet they are immensely successful at deflecting their attacks to other groups for those thousands of years. They will claim it’s Chinese immigrants, or blacks, or Mexicans, or gays, or trans people. Now LLMs and AI are the new boogieman.

We should be talking about UBI, not LLMs.

permalink
report
parent
reply
16 points

It’s a capitalism problem not an AI or copyright problem.

permalink
report
parent
reply
6 points

This isn’t a technological issue, it’s a human one

I totally agree with everything you said, and I know that it will never ever happen. Power is used to get more power. Those in power will never give it up, only seek more. They intentionally frame the narrative to make the more ignorant among us believe that the tech is the issue rather than the people that own the tech.

The only way out of this loop is for the working class to rise up and murder these cunts en masse

Viva la revolucion!

permalink
report
parent
reply
5 points

I completely agree with you, ai should be seen as a great thing, but we all know that the society we live in will not pass those benefits to the average person, in fact it’ll probably be used to make life worse. From a leftist perspective it’s very easy to see this, but from the Norman position, atleast in the US, people aren’t thinking about how our society slants ai towards being evil and scary, they just think ai is evil and scary. Again I completely agree with what you’ve said it’s just important to remember how reactionary the average person is.

permalink
report
parent
reply
4 points
*

Exactly. I work in AI (although not the LLM kind, just applying smaller computer vision models), and my belief is that AI can be a great liberator for humanity if we have the right political and economic apparatus. The question is what that apparatus is. Some will say it’s an inherent feature of capitalism, but that’s not terribly specific, nor does it explain the relatively high wealth equality that existed briefly during the middle of the 20th century in America. I think some historical context is important here.

Historical Precedent

During the Industrial Revolution, we had an unprecedented growth in average labor productivity due to automation. From a naïve perspective, we might expect increasing labor productivity to result in improved quality of life and less working hours. I.e., the spoils of that productivity being felt by all.

But what we saw instead was the workers lived in squalor and abject poverty, while the mega-rich captured those productivity gains and became stupidly wealthy.

Many people at the time took note of this and sought to answer this question: why, in an era over greater-than-ever labor productivity, is there still so much poverty? Clearly all that extra wealth is going somewhere, and if it’s not going to the working class, then it’s evidently going to the top.

One economist and philosopher, Henry George, wrote a book exploring this very question, Progress and Poverty. His answer, in short, was rent-seeking:

Rent-seeking is the act of growing one’s existing wealth by manipulating the social or political environment without creating new wealth.[1] Rent-seeking activities have negative effects on the rest of society. They result in reduced economic efficiency through misallocation of resources, reduced wealth creation, lost government revenue, heightened income inequality,[2] risk of growing political bribery, and potential national decline.

Rent-seeking takes many forms. To list a few examples:

  • Land speculation
  • Monopolization of finite natural resources (e.g., oil, minerals)
  • Offloading negative externalities (e.g., pollution)
  • Monopolization of intellectual property
  • Regulatory capture
  • Monopolistic or oligopolistic control of entire markets

George’s argument, essentially, was that the privatization of the economic rents borne of god-given things — be it land, minerals, or ideas — allowed the rich and powerful to extract all that new wealth and funnel it into their own portfolios. George was not the only one to blame these factors as the primary drivers of sky-high inequality; Nobel-prize winning economist Joseph Stiglitz has stated:

Specifically, I suggest that much of the increase in inequality is associated with the growth in rents — including land and exploitation rents (e.g., arising from monopoly power and political influence).

George’s proposed remedies were a series of taxes and reforms to return the economic rents of those god-given things to society at large. These include:

Land value taxes are generally favored by economists as they do not cause economic inefficiency, and reduce inequality.[2] A land value tax is a progressive tax, in that the tax burden falls on land owners, because land ownership is correlated with wealth and income.[3][4] The land value tax has been referred to as “the perfect tax” and the economic efficiency of a land value tax has been accepted since the eighteenth century.

A Pigouvian tax (also spelled Pigovian tax) is a tax on any market activity that generates negative externalities (i.e., external costs incurred by the producer that are not included in the market price). The tax is normally set by the government to correct an undesirable or inefficient market outcome (a market failure) and does so by being set equal to the external marginal cost of the negative externalities. In the presence of negative externalities, social cost includes private cost and external cost caused by negative externalities. This means the social cost of a market activity is not covered by the private cost of the activity. In such a case, the market outcome is not efficient and may lead to over-consumption of the product.[1] Often-cited examples of negative externalities are environmental pollution and increased public healthcare costs associated with tobacco and sugary drink consumption.[2]

Severance taxes are taxes imposed on the removal of natural resources within a taxing jurisdiction. Severance taxes are most commonly imposed in oil producing states within the United States. Resources that typically incur severance taxes when extracted include oil, natural gas, coal, uranium, and timber. Some jurisdictions use other terms like gross production tax.

such as in the Norwegian model:

The key to Norway’s success in oil exploitation has been the special regime of ownership rights which apply to extraction: the severance tax takes most of those rents, meaning that the people of Norway are the primary beneficiaries of the country’s petroleum wealth. Instead of privatizing the resource rents provided by access to oil, companies make their returns off of the extraction and transportation of the oil, incentivizing them to develop the most efficient technologies and processes rather than simply collecting the resource rents. Exploration and development is subsidized by the Norwegian government in order to maximize the amount of resource rents that can be taxed by the state, while also promoting a highly competitive environment free of the corruption and stagnation that afflicts state-controlled oil companies.

  • Intellectual property reform, e.g., abolishing patents and instead subsidizing open R&D, similar to a Pigouvian anti-tax (research has positive externalities) or Norway’s subsidization of oil exploration
  • Implementation of a citizen’s dividend or universal basic income, e.g., the Alaska permanent fund or carbon tax-and-dividend:

Citizen’s dividend is a proposed policy based upon the Georgist principle that the natural world is the common property of all people. It is proposed that all citizens receive regular payments (dividends) from revenue raised by leasing or taxing the monopoly of valuable land and other natural resources.

This concept is a form of universal basic income (UBI), where the citizen’s dividend depends upon the value of natural resources or what could be titled as common goods like location values, seignorage, the electromagnetic spectrum, the industrial use of air (CO2 production), etc.[4]

In 1977, Joseph Stiglitz showed that under certain conditions, beneficial investments in public goods will increase aggregate land rents by at least as much as the investments’ cost.[1] This proposition was dubbed the “Henry George theorem”, as it characterizes a situation where Henry George’s ‘single tax’ on land values, is not only efficient, it is also the only tax necessary to finance public expenditures.[2] Henry George had famously advocated for the replacement of all other taxes with a land value tax, arguing that as the location value of land was improved by public works, its economic rent was the most logical source of public revenue.[3]

Subsequent studies generalized the principle and found that the theorem holds even after relaxing assumptions.[4] Studies indicate that even existing land prices, which are depressed due to the existing burden of taxation on labor and investment, are great enough to replace taxes at all levels of government.[5][6][7]

(continued)

permalink
report
parent
reply
3 points
*

It is a completely understandable stance in the face of the economic model, though. Your argument could be fitted to explain why firearms shouldn’t be regulated at all. It isn’t the technology, so we should allow the sale of actual machine guns (outside of weird loopholes) and grenade launchers.

The reality is that the technology is targeted by the people affected by it because we are hopeless in changing the broader system which exists to serve a handful of parasitic non-working vampires at the top of our societies.

Edit: not to suggest that I’m against AI and LLM. I want my fully automated luxury communism and I want it now. However, I get why people are turning against this stuff. They’ve been fucked six ways from Sunday and they know how this is going to end for them.

Plus, a huge amount of AI doomerism is being pushed by the entrenched monied AI players, like OpenAI and Meta, in order to used a captured government to regulate potential competition out of existence.

permalink
report
parent
reply
3 points

Technology is but a tool. It cannot tell you how to use it. If it’s in the hands of a writer it’s a helpful sounding board. If it’s in the hands of a Netflix producer it’s an anti-labor tool. We need to protect people’s livelyhoods

permalink
report
parent
reply
2 points
*

Journalism and customer service can’t possibly get worse than they already are.

Books and movies are not at risk - there will always be lots of people willing to write good content for both, and the best content will be published. And “the best” will be a hybrid of humans and AI working together - which is what has some people in that industry so scared. Just like factory workers were scared when machines entered that industry.

It’s an irrational fear - there are still factory workers today. Probably more than ever. And there will still be human writers - it’s an industry that will never go away.

If, however, you refuse to work with AI… then yeah, you’re fucked. Pretty soon you’ll be unemployable and nobody will publish your work, which is why the movie publishers aren’t going to budge. They recognise a day is coming where they can’t sell movies and tv shows that were made exclusively by humans and they are never going to sign a contract locking them into a dead and path.

permalink
report
parent
reply
4 points

You make it sound like the quality is what will increase with human/AI partnership. What will realistically happen is an expected rate of output. Why can’t you deliver a book every year with an AI ghost writing? People who work slowly or meticulously will be phased out by those that can quickly throw together a collage of their own words and their guided filler. It’s amazing and futuristic. It can be very useful and inspirational. But I do not share your optimism that it will make creative industries better. It will allow a single person to put together a script that would’ve taken a team… But the better content will now be drowning in this sea. Unfortunately, I expect an equivalent of the media explosion that happened when the reality tv format became an ok thing, eventually leading to shows half filmed on phones. The end result will be double the marvel movies every year.

permalink
report
parent
reply
4 points

Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?

I swear, most of the people with heavy opinions don’t even know half of how the machines work or what they are doing.

Yah I think it’s fairly obvious that people are both fascinated and scared by the tech and also acknowledge that under a different economic structure, it would be extremely beneficial for everyone and not just for the very few. I think it’s more annoying that people like you assume that everyone is some sort of diet Luddite when they’re just trying to see how the tool has the potential to disrupt many, many jobs and probably not in a good way. And don’t give me this tired comparison about the industrial revolution because it’s a complete false equivalence.

permalink
report
parent
reply
3 points

I am so tired of techno-fetishist AI bros complaining every single time any of the many ways in which AI will devastate and rot out daily lives is brought up.

“It’s not the tech! It’s the economic system!”

As if they’re different things? Who is building the tech? Who is pouring billions into the tech? Who is protecting the tech from proper regulation, smartass? I don’t see any worker coops using AI.

“You don’t even know how it works!”

Just a thought terminating cliche to try to avoid any discussion or criticism of your precious little word generators. No one needs to know how a thing works to know it’s effects. The effects are observable reality.

Also, nobody cares about advertising companies? What the hell are you on about?

permalink
report
parent
reply
1 point
*

they are different things. it’s not exclusively large companies working on and understanding the technology. there’s a fantastic open-source community, and a lot of users of their creations.

would destroying the open-source community help prevent the big-tech from taking over? that battle has already been lost and needs correction. crying about the evil of A.I. doesn’t actually solve anything. “proper” regulation is also relative. we need entirely new paradigms of understanding things like “I.P.” which aren’t based on a century of lobbying from companies like disney. etc.

and yes, understanding how something works is important for actually understanding the effects, when a lot of tosh is spewed from media sites that only care to say what gets people to engage.

i’d say a fraction of what i see as vaguely directed anger towards anything A.I. is actually relegated to areas that are actual severe and important breaches of public trust and safety, and i think the advertising industry should be the absolute focal point on the danger of A.I.

Are you also arguing against every other technology that has had their benefits hoarded by the rich?

permalink
report
parent
reply
0 points
*

It’s mostly large companies, some models are open source (of which only some are also community driven), but the mainstream ones are the ones being entirely funded by, legally protected by, and pushed onto everything by capitalist olligarchs.

What other options do you have? I’m sick and tired of people like you seeing workers lose their jobs, seeing real people used like meat puppets by the internet, seeing so many artists risking their livelihoods, seeing that we’ll have to lose faith in everything we see and read because it could be irrecognizably falsified, and CLAIMING you care about it, only to complain every single time any regulation or way to control this is proposed, because you either don’t actually care and are just saying it for rhetoric, or you do care but only to the point you can still use your precious little toys restriction-free. Just overthrow the entire economic system of all countries on earth, otherwise don’t do anything, let all those people burn! Do you realize how absurd you sound?

It’s sociopathic. I don’t say it as an insult, I say it applying the definition of a word, it’s a complete lack of empathy and care for your fellow human beings, it’s viewing an inmaterial piece of technology, nothing but a thoughtless word generator, like inherently worth more than the livelihood of millions. I’m absolutely sick of it. And then you have the audacity to try to seem like the reasonable ones when arguing about this, knowing if you had your way so many would suffer. Framing it as anti-capitalism knowing that if you had your way you’d pave the way for the olligarchs to make so many more billions off of that suffering.

permalink
report
parent
reply
31 points

We built a machine to mimic human writing. There’s going to a point where there is no difference. We might already be there.

permalink
report
reply
12 points

The machine used to mimic human text uses human text. If it can’t find the difference in it’s text and human text, it will begin using AI text to mimic human text. This will eventually lead to errors, repetitions, and/or less human like text.

permalink
report
parent
reply
2 points

We are already seeing it 1 year into GPT as human authors bow out when not paid.

permalink
report
parent
reply
25 points
*

Predictable issue if you knew the fundamental technology that goes into these models. Hell it should have been obvious it was headed this way to the layperson once they saw the videos and heard the audio.

We’re less sensitive to patterns in massive data, the point at which we cant tell fact from ai fiction from the content is before these machines can’t tell. Good luck with the FB aunt’s.

GANs final goal is to develop content that is indistinguishable… Are we surprised?

Edit since the person below me made a great point. GANs may be limited but there’s nothing that says you can’t setup a generator and detector llm with the distinct intent to make detectors and generators for the sole purpose of improving the generator.

permalink
report
reply
22 points

For laymen who might not know how GANs work:

Two AI are developed at the same time. One that generates and one that discriminates. The generator creates a dataset, it gets mixed in with some real data, then that all of that gets fed into the discriminator whose job is to say “fake or not”.

Both AI get better at what they do over time. This arms race creates more convincing generated data over time. You know your generator has reached peak performance when its twin discriminator has a 50/50 success rate. It’s just guessing at that point.

There literally cannot be a better AI than the twin discriminator at detecting that generator’s work. So anyone trying to make tools to detect chatGPT’s writing is going to have a very hard time of it.

permalink
report
parent
reply
6 points

Fantastically put!

permalink
report
parent
reply
1 point

Tx!

permalink
report
parent
reply
3 points

Unless I’m mistaken, aren’t GANs mostly old news? Most of the current SOTA image generation models and LLMs are either diffusion-based, transformers, or both. GANs can still generate some pretty darn impressive images, even from a few years ago, but they proved hard to steer and were often trained to generate a single kind of image.

permalink
report
parent
reply
1 point
*

I haven’t been in decision analytics for a while (and people smarter than I are working on the problem) but I meant more along the lines of the “model collapse” issue. Just because a human gives a thumbs up or down doesn’t make it human written training data to be fed back. Eventually the stuff it outputs becomes “most likely prompt response that this user will thumbs up and accept”. (Note: I’m assuming the thumbs up or down have been pulled back into model feedback).

Per my understanding that’s not going to remove the core issue which is this:

Any sort of AI detection arms race is doomed. There is ALWAYS new ‘real’ video for training and even if GANs are a bit outmoded, the core concept of using synthetically generated content to train is a hot thing right now. Technically whomever creates a fake video(s) to train would have a bigger training set than the checkers.

Since we see model collapse when we feed too much of this back to the model we’re in a bit of an odd place.

We’ve not even had a LLM available for the entire year but we’re already having trouble distinguishing.

Making waffles so I only did a light google but I don’t really think chatgpt is leveraging GANs for it’s main algos, simply that the GAN concept could be applied easily to LLM text to further make delineation hard.

We’re probably going to need a lot more tests and interviews on critical reasoning and logic skills. Which is probably how it should have been but it’ll be weird as that happens.

sorry if grammar is fuckt - waffles

permalink
report
parent
reply
1 point
*

So a few tidbits you reminded me of:

  • You’re absolutely right: there’s what’s called an alignment problem between what the human thinks looks superficially like a quality answer and what would actually be a quality answer.

  • You’re correct in that it will always be somewhat of an arms race to detect generated content, as lossy compression and metadata scrubbing can do a lot to make an image unrecognizable to detectors. A few people are trying to create some sort of integrity check for media files, but it would create more privacy issues than it would solve.

  • We’ve had LLMs for quite some time now. I think the most notable release in recent history, aside from ChatGPT, was GPT2 in 2019, as it introduced a lot of people to to the concept. It was one of the first language models that was truly “large,” although they’ve gotten much bigger since the release of GPT3 in 2020. RLHF and the focus on fine-tuning for chat and instructability wasn’t really a thing until the past year.

  • Retraining image models on generated imagery does seem to cause problems, but I’ve noticed fewer issues when people have trained FOSS LLMs on text from OpenAI. In fact, it seems to be a relatively popular way to build training or fine-tuning datasets. Perhaps training a model from scratch could present issues, but generally speaking, training a new model on generated text seems to be less of a problem.

  • Critical reading and thinking was always a requirement, as I believe you say, but certainly it’s something needed for interpreting the output of LLMs in a factual context. I don’t really see LLMs themselves outperforming humans on reasoning at this stage, but the text they generate certainly will make those human traits more of a necessity.

  • Most of the text models released by OpenAI are so-called “Generative Pretrained Transformer” models, with the keyword being “transformer.” Transformers are a separate model architecture from GANs, but are certainly similar in more than a few ways.

permalink
report
parent
reply
23 points

On the one hand, our AI is designed to mimic human text, on the other hand, we can detect AI generated text that was designed to mimic human text. These two goals don’t align at a fundamental level

permalink
report
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 12K

    Posts

  • 553K

    Comments