What human author hasn’t read and been inspired by existing copyrighted works?
It’s not even that uncommon for humans to accidentally copy them too closely later on.
Machines don’t have inspiration, they are not people. They do not make decisions based upon artistic choice or aesthetic preference or half-remembered moments, they are plagiarism machines trained on millions of protected works designed for the explicit purpose of putting all those who created what it copies out of work.
In a vacuum AI tools are as harmless and benign as you want them to be, but in reality they are disastrously harmful to the environment to train, and they are already ruining the livelihoods of human creators who actually make art.
Whenever I see them described as “plagiarism machines”, odds are about 99% that the person using the term have no idea how these models work. Like with humans, they can overfit, but most of what they output will have have far less in common with any individual work than levels of imitations people engage in without being accused of plagiarism all the time.
As for the environmental effects, it’s a totally ridiculous claim - the GPUs used to train even the top of the line ChatGPT models adds up to a tiny rounding error of the power use of even middling online games, and training has only gotten more efficient since.
E.g. researchers at Oak Ridge National Labs published a paper in December after having trained a GPT4 scale model with only 3k GPUs on the Frontier supercomputer using Megatron-DeepSpeed. 3k GPUs is about 8% of Frontiers capacity, and while Frontier is currently fastest, there are hundreds of supercomputers at that kind of scale publicly known about, and many more that are not. Never mind the many millions of GPUs not part of any supercomputer.
I fully agree with you. I mean, even search engines are fully reliant on the ingest and storage of copyrighted material.
Of course the elephant in the room is how do we stop multi-billion dollar companies from advancing the technology significantly enough to put artists, programmers, writers and the like out of business.
You can’t. The cat is out of the bag. The algorithms are well understood, and new papers on ways to improve output of far smaller models come out every day. It’s just a question of time before training competitive models will be doable for companies in a whole range of jurisdictions entirely unlikely to care.