Who needs human when you have AI :p
This is about what I expect from a half decent AI model with a fairly simple prompt, and an out of the box tokenizer.
Prompt Engineers will be a major new career path in the coming decades. Good output takes knowing how to fine tune the training inside the model, knowing what exists in the base dataset and how to access it, tuning the settings, some powerful hardware running lots of iterations. All of humanity’s creativity can be harnessed across languages and cultures using this tech. Mastering this will be a real challenge. It is probably the most complicated tool or instrument humans have ever created. You are really commenting on the skill of the prompt, not the technology. It will take quite a while for this to normalize and become the mainstream. I highly recommend trying it out for yourself. FOSS AI is a thing. There are lots of offline tools and options available already. Soon, Open Source will dominate this space too.
Yeah or LLMs succumb to model collapse and just keep getting worse until eventually the fad dies.
It is easy to have too many cooks in the kitchen, but that is an easy problem to solve. Model decay is not a real problem if you understand how a LLM works. Overtraining is like burning a big dinner and ruining a meal. One doesn’t stop cooking forever, or burn down the house and quit. You just cook another meal next time. If your model has 100 trillion tokens, you’re likely to try your very best to salvage your massive ruined dish, but in the end, it doesn’t matter. You can easily tweak the recipe for next time. Models have no persistent memory. Context can be used to train and turned into data, but it is a totally separate thing that is unrelated to the model itself. As an oversimplification, a LLM is just a large database of categories mixed with a massive amount of language data that enables a statistical calculation of what word should come next. This is a simple prediction of what word comes next. Everything else is censoring algorithms and illusions embedded into how humans use language. Really, thus is a tool to access culture through language, and in the case of larger models, the culture embedded into many different human languages.
This is as much of a “fad” now as the internet was in the late 90’s, and this is on par with that change. LLMs are no fad. This is a tool as disruptive as the public internet. For instance, in 10 years, Google will be a relic of the past. AI will completely replace it. Education will also completely change. It is possible to have entirely individualized education. Physiology will change as a LLM can be tuned to address and help with many human social issues. This will change everything because it exists I’m the open source space already.
If we assume LLMs are as revolutionary as you are suggesting, then how is model collapse an easy problem to solve? Google is a relic of the past, the internet is filled with AI generated content; then where will the training data come from? We can’t replace human generated content with AI generated content without an inevitable model collapse.
Oh and btw, good luck with differentiating between human generated and AI generated. Already, social media sites are being cluttered with AI generated content, Amazon book publishing being cluttered with shit tier LLM generated “books” (cheap immitations), and if academia goes this way, and entertainment as many speculate, there’s hardly anything left.
Why would they? There’s plenty of non-AI-generated material to train them off of and it’s something that future trainers will watch out for.
Sure there may be a lot, but it’s still finite. And already, social media is being filled with AI generated content. If the trend continues, human generated content will be dwarfed by AI generated content. And it’s not going to be a simple process to distinguish between the two.