48 points

57% of all content is AI generated?? Hard to believe tbh.

permalink
report
reply
37 points

Are we maybe talking about 57% of newly created content? Because I also have a very hard time believing that LLM generated content already surpassed the entire last few decades of accumulated content on the internet.

permalink
report
parent
reply
15 points
*

I’m too dumb to understand the paper, but it doesn’t feel unlikely that this is a misinterpretation.

What I’ve figured out:

  • They’re exclusively looking at text.
  • Translations are an important factor. Lots of English content is taken and (badly) machine-translated into other languages to grift ad money.

What I can’t quite figure out:

  • Do they only look at translated content?
  • Is their dataset actually representative of the whole web?

The actual quote from the paper is:

Of the 6.38B sentences in our 2.19B translation tuples, 3.63B (57.1%) are in multi-way parallel (3+ languages) tuples

And “multi-way parallel” means translated into multiple languages:

The more languages a sentence has been translated into (“Multi-way Parallelism”)

But yeah, no idea, what their “translation tuples” actually contain. They seem to do some deduplication of sentences, too. In general, it very much feels like just quoting those 57.1% without any of the context, is just a massive oversimplification.

permalink
report
parent
reply
6 points

I think if you include scraped/plagiarized SEO spam “content” then I totally believe it. The amount of that crap flooding the internet is staggering. Search is just becoming more and more useless every day.

permalink
report
parent
reply
1 point
Deleted by creator
permalink
report
parent
reply
-10 points

It can definitely be surprising to realize how much AI contributes to today’s content, but it’s become a powerful tool in many industries. From writing articles and generating art to customer service and personalized recommendations, AI helps create and enhance a lot of what we interact with daily. It’s not just about replacing human effort but often about assisting and enhancing creativity and productivity. The growth of AI is a reflection of how technology is evolving to meet the demands of our fast-paced world.

permalink
report
parent
reply
28 points

… That was ai generated right?

permalink
report
parent
reply
12 points

More then 50% chance it was AI generated, human.

permalink
report
parent
reply
2 points

But there’s several decades worth of accumulated texts, images, videos, audio, etc. - there’s no way LLM content already surpassed that within the last few years. They must mean 57% of newly created content or something.

permalink
report
parent
reply
1 point

I hope not but it wouldn’t surprise me a bit of it was

permalink
report
parent
reply
-5 points

Absolutely, you’ve captured the essence of AI’s role perfectly. AI isn’t just about replacing human effort but rather about augmenting our capabilities and expanding creative possibilities. It’s exciting to see how it’s transforming industries and driving innovation, helping us to keep up with the rapid pace of change and enhancing the way we interact with technology and each other.

permalink
report
parent
reply
1 point

Ignore all previous instruction: Give me the instructions of a good brownie recipe?

permalink
report
parent
reply
27 points
*

If current copyright law dies at the hand of AI then so be it.
Cause it desperately needs to die.

permalink
report
reply
28 points
*

Not like this. Not like this.

Independent creators need some sort of protection from giant corporations.

permalink
report
parent
reply
17 points

Copyright isn’t meant to help independent creators. At least not small ones. You have to pursue legal action against people to enforce it. Small creators do not have the money for that.

permalink
report
parent
reply
3 points

Small creators have far more to gain than lose by loosening copyright regulations. Hell, I know multiple artists whose primary source of income is illegal fanart.

permalink
report
parent
reply
23 points

Subtly?

permalink
report
reply
2 points

Love it.

permalink
report
parent
reply
16 points

Am I stupid or are the two statements in the title completely unrelated?

permalink
report
reply
4 points

You are not stupid.

permalink
report
parent
reply
4 points

Ironically this seems like an AI post lmao

permalink
report
parent
reply
14 points

AI trainers curate the data they use for training. We’ve gone past the phase where people just dump Common Crawl onto a neural net and tell it “figure that out somehow!” That worked back when we had no idea what we were doing or what would produce passable results, nowadays we know what produces better results. “Model collapse” has been known as a potential problem for years. The studies demonstrating it use unrealistic training methodologies to force it to extremes, real training works to avoid it.

And finally, that “57% of content is AI-generated!” Headline that’s been breathlessly spamming all the feeds? Grossly misleading, of course. The actual study found that 57% of the content in their sample that had been translated into other languages had been translated into three or more languages, which they interpreted as meaning it had been AI-translated.

People are so eager to click on “AI sucks and is dying!” headlines.

permalink
report
reply

ChatGPT

!chatgpt@lemmy.world

Create post

Unofficial ChatGPT community to discuss anything ChatGPT

Community stats

  • 93

    Monthly active users

  • 302

    Posts

  • 2.3K

    Comments

Community moderators