Who needs human when you have AI :p

You are viewing a single thread.
View all comments View context
4 points
*

Yeah or LLMs succumb to model collapse and just keep getting worse until eventually the fad dies.

permalink
report
parent
reply
2 points

It is easy to have too many cooks in the kitchen, but that is an easy problem to solve. Model decay is not a real problem if you understand how a LLM works. Overtraining is like burning a big dinner and ruining a meal. One doesn’t stop cooking forever, or burn down the house and quit. You just cook another meal next time. If your model has 100 trillion tokens, you’re likely to try your very best to salvage your massive ruined dish, but in the end, it doesn’t matter. You can easily tweak the recipe for next time. Models have no persistent memory. Context can be used to train and turned into data, but it is a totally separate thing that is unrelated to the model itself. As an oversimplification, a LLM is just a large database of categories mixed with a massive amount of language data that enables a statistical calculation of what word should come next. This is a simple prediction of what word comes next. Everything else is censoring algorithms and illusions embedded into how humans use language. Really, thus is a tool to access culture through language, and in the case of larger models, the culture embedded into many different human languages.

This is as much of a “fad” now as the internet was in the late 90’s, and this is on par with that change. LLMs are no fad. This is a tool as disruptive as the public internet. For instance, in 10 years, Google will be a relic of the past. AI will completely replace it. Education will also completely change. It is possible to have entirely individualized education. Physiology will change as a LLM can be tuned to address and help with many human social issues. This will change everything because it exists I’m the open source space already.

permalink
report
parent
reply
2 points

If we assume LLMs are as revolutionary as you are suggesting, then how is model collapse an easy problem to solve? Google is a relic of the past, the internet is filled with AI generated content; then where will the training data come from? We can’t replace human generated content with AI generated content without an inevitable model collapse.

Oh and btw, good luck with differentiating between human generated and AI generated. Already, social media sites are being cluttered with AI generated content, Amazon book publishing being cluttered with shit tier LLM generated “books” (cheap immitations), and if academia goes this way, and entertainment as many speculate, there’s hardly anything left.

permalink
report
parent
reply
-1 points

Oh and btw, good luck with differentiating between human generated and AI generated.

One easy way to do this is to check if it was generated before 2023. Not so much AI-generated content from before then.

Amazon book publishing being cluttered with shit tier LLM generated “books”

So filter the books based on how “shit tier” they are.

In the end, what’s needed to train AIs is good content. If some of that good content is itself AI-generated, who cares? You need to be selective in how you pick training material anyway.

permalink
report
parent
reply
0 points

Why would they? There’s plenty of non-AI-generated material to train them off of and it’s something that future trainers will watch out for.

permalink
report
parent
reply
1 point

Sure there may be a lot, but it’s still finite. And already, social media is being filled with AI generated content. If the trend continues, human generated content will be dwarfed by AI generated content. And it’s not going to be a simple process to distinguish between the two.

permalink
report
parent
reply
1 point

Infinite training data isn’t required.

It’s actually fine to include some AI-generated data in your training set, the reason “model collapse” happens is when you train on only AI-generated content and you end up losing out on some of the less-common outputs. Without the less-common cases in the training data each generation of AI has less diverse information to learn from. If you make sure the training set is diverse enough then it should be fine.

All else fails, just make sure a lot of your data is from before 2023.

permalink
report
parent
reply

Science Fiction

!sciencefiction@lemmy.world

Create post

Welcome to /c/ScienceFiction

December book club canceled. Short stories instead!

We are a community for discussing all things Science Fiction. We want this to be a place for members to discuss and share everything they love about Science Fiction, whether that be books, movies, TV shows and more. Please feel free to take part and help our community grow.

  1. Be civil: disagreements happen, but that doesn’t provide the right to personally insult others.
  2. Posts or comments that are homophobic, transphobic, racist, sexist, ableist, or advocating violence will be removed.
  3. Spam, self promotion, trolling, and bots are not allowed
  4. Put (Spoilers) in the title of your post if you anticipate spoilers.
  5. Please use spoiler tags whenever commenting a spoiler in a non-spoiler thread.

Lemmy World Rules

Community stats

  • 817

    Monthly active users

  • 335

    Posts

  • 8.1K

    Comments

Community moderators