AlexanderESmith
A.K.A.
@AlexanderESmith
@AlexanderESmith
I already replied to the essence of this in my reply to your other post about how “illegal downloads aren’t theft because its a copy”, but I’ll mention here that this is even more evidence that you aren’t a creator, and I suggest that your opinions on this subject aren’t relevant, and you should avoid subjecting other people to them.
The MPAA and music industry would beg to differ. As would the US courts, as well as any court in a country we share copyright agreements with.
Consider that if a movie uses a scene from another movie without permission, or a music producer uses a melody without permission, or either of them use too much of an existing song without permission, everyone sues everyone else, and they win.
Consider also that if a large corporation uses an individual’s content without permission, we have documented cases of the individual suing, and winning (or settling).
Some other facts to consider;
- An
mp3
file is not inherently illegal. Nor is atorrent
file/tracker/download. - If the
mp3
file contains audio you don’t own the rights to, it is illegal, same for thetorrent
you used to download/distribute it. In the eyes of the law, it’s theft. - A trained LLM or image generation model is not inherently theft, if you only use open-source or licensed/owned content to train it
- (at odds in our conversation) What of a model that eas trained with content the trainer didn’t own?
In the mp3
example, its largely an individual stealing from a large company. On the Internet, this is frequently cheered as the user “sticking it to the man” (unless, of course, you’re an indie creator who can’t support yourself because everyone’s downloading your content for free). Discussions regarding the morality of this have been had - and will be had - for a long time, but it’s legality is a settled matter: It’s not legal.
In the case of “AI” models, its large companies stealing from a huge number of individuals who have no support or established recourse.
You’re suggesting that it’s fine because, essentially, the creators haven’t lost anything. This makes it extremely clear to me that you’ve never attempted to support yourself as a creator (and I suspect you haven’t created anything of meaning in the public domain either).
I guess what it comes down to is this; If creators can be stolen from without consequence, what incentive does anyone have to create anything? Are you going to work your 40-60 hours a week, then come home and work another 20-40 hours to create something for no personal benefit other than the act of creation? Truely, some people will. Most wont.
You’ll waste more time trying to figure out how to do this than it would take to move a monitor and keyboard to the server, do the install, and plug the monitor and keyboard back into your main computer. Once the server is up, you can administer it over the network via ssh.
“Your honor, we can use whatever data we want because model training is probably fair use, or whatever”.
I don’t know what’s worse, the fact that you think creators don’t have the right to dictate how their works are used, or that you apparently have no idea what fair use is.
This might help; https://copyright.gov/fair-use/
This “fair use” argument is excellent if used specifically in the context of “education, not commercialization”. Best one I’ve seen yet, actually.
The only problem is that perplexity.ai
isn’t marketing itself as educational, or as a commentary on the work, or as parody. They tout themselves as a search engine. They also have paid “pro” and “enterprise” plans. Do you think they’re specifically contextualizing their training data based on which user is asking the question? I absolutely do not.
you got some criticism and now you’re saying everyone else is a bot or has an agenda
Please look up ad hominem, and stop doing it. Yes, their responses are a distraction from the topic at hand, but so were the random posts calling OP paranoid. I’d have been on the defensive too.
[Our company] publish[s] open source work … anyone is free to use it for any purpose, AI training included
Great, I hope this makes the models better. But you made that decision. OP clearly didn’t. In fact, they attempted to use several methods to explicitly block it, and the model trainers did it anyway.
I think that the anti-AI hysteria is stupid virtue signaling for luddites
Many loudly outspoken figures against the use of stolen data for the training of generative models work in the tech industry, myself included (I’ve been in the industry for over two decades). We’re far from Luddites.
LLMs are here
I’ve heard this used as a justification for using them, and reasonable people can discuss the merits of the technology in various contexts. However, this is not a justification for defending the blatant theft of content to train the models.
whether or not they train on your random project isn’t going to affect them in any meaningful way
And yet, they did it while ignoring explicit instructions to the contrary.
there are more than enough fully open source works to train on
I agree, and model trainers should use that content, instead of whatever they happen to grab off every site they happen to scrape.
Better to have your work included so that the LLM can recommend it to people or answer questions about it
I agree if you give permission for model trainers to do so. That’s not what happened here.
“The world seeing [their] work” is not equal to “Some random company selling access to their regurgitated content, used without permission after explicitly attempting to block it”.
LLMs and image generators - that weren’t trained on content that is wholly owned by the group creating the model - is theft.
Not saying LLMs and image generators are innately thievery. It’s like the whole “illegal mp3
” argument. mp3s
are just files with compressed audio. If they contain copyrighted work, and obtained illegitimately, THEN their thievery. Same with content generators.
Eh. This is not a new argument, and not the first evidence of it. I don’t think you’re gonna be high on their list of retaliation targets, if you register at all (to say nothing of the low-to-middling reach of the fediverse in general).
Hell, just look at photographers/painters v. image generators, or the novel/article/technical authors v. … practically all LLMs really, or any other of a dozen major stories about “AI” absorbing content and spitting out huge chunks of essentially unmodified code/writing/images.