cross-posted from: https://lemmy.intai.tech/post/43759

cross-posted from: https://lemmy.world/post/949452

OpenAI’s ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models

99 points

So we can sue robots but when I ask if we can tax them, and reduce human working hours, I’m the crazy one?

permalink
report
reply
22 points

So we can sue robots

… No?

permalink
report
parent
reply
11 points

What would be the legal argument for this? I’m not against it but I don’t know how it could be argued.

permalink
report
parent
reply
22 points

Legal basis for suing a company that uses another company’s product/creations without approval seems like a fairly pretty straightforward intellectual property issue.

Legal basis for increased taxes on revenue from AI or automation in general could be created in the same way that any tax is created: through legislation. Income tax didn’t exist before, now it does. Tax breaks for mortgage interest didn’t use it exist, now it does. Levying higher taxes on companies that rely heavily on automated systems and creating a UBI out of the revenue might not exist right now, but it can, if the right people are voted in to pass the right laws.

permalink
report
parent
reply
-3 points

I don’t think a UBI makes sense, for many people it will just be extra money in their pocket to spend which continues the endless inflation in prices until the gain disappears.

More efficient targeting of benefits to those who need it with that money would actually help reduce inequality

permalink
report
parent
reply
-8 points

I don’t think a UBI makes sense, for many people it will just be extra money in their pocket to spend which continues the endless inflation in prices until the gain disappears.

More efficient targeting of benefits to those who need it with that money would actually help reduce inequality

permalink
report
parent
reply
16 points

I’m no expert on law but maybe something about AI unethically taking our jobs away

permalink
report
parent
reply
24 points

Universal base income + AI/robots taking care of all necessary jobs sounds great

permalink
report
parent
reply
11 points

China didn’t take your job and neither will AI. Corporations will replace you for something that cost less.

We can’t really legislate against AI because other countries won’t. Its also a huge boon for society, we just have to make sure the profits are redistributed and work hours overall are reduced instead of all the productivity gain going into the pockets of the mega wealthy

permalink
report
parent
reply
0 points

What makes it unethical? How is it different from advancements in technology taking away any other job, like elevator operators, numerous factory positions, calculators (the human kind), telephone operators, people who sew clothes (somewhat), and so on?

It seems to me that automating away jobs has historically bettered humanity. Why would we want a job to be done by a person when we can have a machine do it (assuming a machine does equal or better)? We can better focus people on other jobs and eventually, hopefully, with no mandatory need for a job at all.

permalink
report
parent
reply
4 points
*

It could be argued that when our tax code, laws, and constitution were created there weren’t AIs taking jobs and funneling the economy to only a few people breaking the system and it’s time for us to adapt as a society. But I know adapting isn’t a strength of our legal system.

Also, you wouldn’t be suing the AI as it’s own entity. You would be suing the creator/owner that is allowing it to steal people’s content. AI is not to the point it is sentient and responsible for it’s own actions.

permalink
report
parent
reply
0 points

That’s actually a great argument: an AI is trained without permission on the result of people’s labor, and is thus able to intercept the need for this labor and take away financial opportunities derived thereof. Therefore, An AI’s labor and its profit could be argued to contain, in the percentage that an AI is the content of its training, a portion that is proportionately belonging to those who did this labor its obscure process is based on. Therefore, an AI’s master should take a portion of its revenue as royalties and distribute them to the “people’s council” which in this case is just the government, for it to redistributed accordingly.

permalink
report
parent
reply
11 points

What would you tax exactly? Robots don’t earn an income and don’t inherently make a profit. You could tax a company or owner who profits off of robots and/or sells their labor.

permalink
report
parent
reply
1 point

It would have to be some sort of moderated labor cost saving tax kind of thing enforced by the government

permalink
report
parent
reply
9 points

Should we tax bulldozers because they take away jobs from people using shovels? What about farm equipment, since they take away jobs from people picking fruit by hand? What about mining equipment, because they take away jobs from people using pickaxes?

permalink
report
parent
reply
5 points
*

If we think of production as costing land, labour and capital, then more efficient methods of production would likely swap labour for capital. In that case then we just tax capital growth like we’re doing now (Only properly, like without the loopholes). No need to complicate it past that

permalink
report
parent
reply
-1 points

I’m not sure how feasible it is but I’ve seen a sort of “minimum wage” for robots suggested which is paid to the government as tax.

permalink
report
parent
reply
76 points

“Massive Trouble”

Step 1 - Scrape everyone’s data to make your LLM and make a high profile deal worth $10B Step 2 - Get sued by everyone whose data you scraped Step 3 - Settle and everyone in the class will be eligible for $5 credit using ChatGPT-4 Step 4 - Bask in the influx of new data Step 5 - Profit

permalink
report
reply
32 points

i posted on the public internet with the intent and understanding that it would be crawled by systems for all kinds of things. if i dont want content to be grabbed i dont publish it publicly

you can’t easily have it both ways imo. even with systems that do strong pki if you want the world in general to see it you are giving up a certain amount of control over how the content gets used.

law does not really matter here as much as people would like to try to apply it, this is simply how public content will be used. Go post in a garden if you don’t want to get scrapped, just remember the corollary is your reach, your voice is limited to the walls of that garden.

permalink
report
parent
reply
29 points

What you said makes a lot of sense. But here’s the catch: it assumes OpenAI checked the licensing for all the stuff they grabbed. And I can guarantee you they didn’t.

It’s damn near impossible to automatically check the licensing for all the stuff they got she we know for a fact they got stuff whose licensing does not allow it to be used this way. Microsoft has already been sued for Copilot, and these lawsuits will keep coming. Assuming they somehow managed to only grab legit material and they used excellent legal advisors that assured them out would stand in court, it’s definitely impossible to tell what piece of what goes where after it becomes a LLM token, and also impossible to tell what future lawsuits will decide about it.

Where does that leave OpenAI? With the good ol’ “I grabbed something off the internet because I could”. Why does that sound familiar? It’s something people have been doing since the internet was invented, it’s commonly referred to as “piracy”. But it’s supposed to be wrong and illegal. Well either it’s wrong and illegal for everybody or the other way around.

permalink
report
parent
reply
7 points

there were court cases around this very thing and google and webarchive. I suspect thier legal team is expecting similar precedent with the issue being down to the individual and how they use the index, example, using it to make my own unique character (easily done) vs making an easy and obvious rip off of a Disney property. The same tests can be applied, the question IMO isn’t about the index that is built here. I can memorize a lot (some people have actual eidetic memory) and synthesize it too which is protected and I can copyright my own mental outputs. The disposition of this type of output vs mechanical outputs i expect will be where things end up being argued.

I’m not going to say I’m 100% right here, we are in a strange timeline but there is precedent for what OAI is doing IMO.

permalink
report
parent
reply
4 points

The difference between piracy and having your content used for training a generative model, is that in the latter case, the content isn’t redistributed. It’s like downloading a movie from netflix (and eventually distributing it for free) vs watching a movie on netflix and using it as inspiration to make your own movie.

The legality of it all is unclear and most of that is because the technology evolved so quickly that the legal framework is just not equipped to deal with it. Despite the obvious moral issues with scraping artist’s content.

permalink
report
parent
reply
4 points

Yes, notice I said “Scrape” and not “steal” :)

permalink
report
parent
reply
1 point

abs, providing some more background from my perspective.

permalink
report
parent
reply
59 points

People talk about OpenAI as if its some utopian saviour that’s going to revolutionise society. When in reality its a large corporation flooding the internet with terrible low-quality content using machine learning models that have existed for years. And the fields it is “automating” are creative ones that specifically require a human touch, like art and writing. Language learning models and image generation isn’t going to improve anything. They’re not “AI” and they never will be. Hopefully when AI does exist and does start automating everything we’ll have a better economic system though :D

permalink
report
reply
-16 points

The thing that amazes me the most about AI Discourse is, we all learned in Theory of Computation that general AI is impossible. My best guess is that people with a CS degree who believe in AI slept through all their classes.

permalink
report
parent
reply
33 points
*

we all learned in Theory of Computation that general AI is impossible.

I strongly suspect it is you who has misunderstood your CS courses. Can you provide some concrete evidence for why general AI is impossible?

permalink
report
parent
reply
-8 points

Evidence, not really, but that’s kind of meaningless here since we’re talking theory of computation. It’s a direct consequence of the undecidability of the halting problem. Mathematical analysis of loops cannot be done because loops, in general, don’t take on any particular value; if they did, then the halting problem would be decidable. Given that writing a computer program requires an exact specification, which cannot be provided for the general analysis of computer programs, general AI trips and falls at the very first hurdle: being able to write other computer programs. Which should be a simple task, compared to the other things people expect of it.

Yes there’s more complexity here, what about compiler optimization or Rust’s borrow checker? which I don’t care to get into at the moment; suffice it to say, those only operate on certain special conditions. To posit general AI, you need to think bigger than basic block instruction reordering.

This stuff should all be obvious, but here we are.

permalink
report
parent
reply
20 points

The existence of natural intelligence is the proof that artificial intelligence is possible.

permalink
report
parent
reply
11 points

We can simulate all manner of physics using a computer, but we can’t simulate a brain using a computer? I’m having a real hard time believing that. Brains aren’t magic.

permalink
report
parent
reply
-2 points

Computer numerical simulation is a different kind of shell game from AI. The only reason it’s done is because most differential equations aren’t solvable in the ordinary sense, so instead they’re discretized and approximated. Zeno’s paradox for the modern world. Since the discretization doesn’t work out, they’re then hacked to make the results look right. This is also why they always want more flops, because they believe that, if you just discretize finely enough, you’ll eventually reach infinity (or infinitesimal).

This also should not fill you with hope for general AI.

permalink
report
parent
reply
10 points

It’s all buzzword exaggerations. It’s marketing.

Remember when hoverboards were for things that actually hover instead of some motorized bullshit on two wheels? Yeah, same bullshit.

permalink
report
parent
reply
1 point
Deleted by creator
permalink
report
parent
reply
44 points

If this lawsuit is ruled in favor of the plaintiff, it might lead to lawsuits against those who have collected and used private data more maliciously, from advertisement-targeting services to ALPR services that reveal to law enforcement your driving habits.

permalink
report
reply
10 points

So some of the most profitable corporations in the world? In that case this lawsuit isn’t going anywhere.

permalink
report
parent
reply
41 points
*

Good. Technology always makes strides before the law can catch up. The issue with this is that multi million dollar companies use these gaps in the law to get away with legally gray and morally black actions all in the name of profits.

Edit: This video is the best way to educate yourself on why ai art and writing is bad when it steals from people like most ai programs currently do. I know it’s long, but it’s broken up into chapters if you can’t watch the whole thing.

permalink
report
reply
19 points

Totally agree. I don’t care that my data was used for training, but I do care that it’s used for profit in a way that only a company with big budget lawyers can manage

permalink
report
parent
reply
4 points
*

But if we’re drawing the line at “did it for profit”, how much technological advancement will happen? I suspect most advancement is profit driven. Obviously people should be paid for any work they actually put in, but we’re talking about content on the internet that you willingly create for fun and the fact it’s used by someone else for profit is a side thing.

And quite frankly, there’s no way to pay you for this. No company is gonna pay you to use your social media comments to train their AI and even if they did, your share would likely be pennies at best. The only people who would get paid would be companies like reddit and Twitter, which would just write into their terms of service that they’re allowed to do that (and I mean, they already use your data for targeting ads and it’s of course visible to anyone on the internet).

So it’s really a choice between helping train AI (which could be viewed as a net benefit for society, depending on how you view those AIs) vs simply not helping train them.

Also, if we’re requiring payment, only the super big AI companies can afford to frankly pay anything at all. Training an AI is already so expensive that it’s hard enough for small players to enter this business without having to pay for training data too (and at insane prices, if Twitter and Reddit are any indication).

permalink
report
parent
reply
8 points

Hundreds of projects in github are supported by donations, innovation happens even without profit incentives. It may slow down the pace of AI development but I am willing to wait anothrt decade for AIs if it protects user data and let’s regulation catch up.

permalink
report
parent
reply
2 points

Reddit is currently trying to monetize their user comments and other content by charging for API access. Which creates a system where only the corporations profit and the users generating the content are not only unpaid, but expected to pay directly or are monetized by ads. And if the users want to use the technogy trained by their content they also have to pay for it.

Sure seems like a great deal for corporations and users getting fleeced as much as possible.

permalink
report
parent
reply
12 points

I’m honestly at a loss for why people are so up at arms about OAI using this practice and not Google or Facebook or Microsoft, ect. It really seems we’re applying a double standard just because people are a bit pissed at OpenAI for a variety of reasons, or maybe just vaguely mad at the monetary scale of “tech giants”

My 2 cents: I don’t think content posted on the open internet (especially content produced by users on a free platform being claimed not by those individuals but by the platforms themselves) should be litigated over, when that information isnt even being reproduced but being used on derivative works. I think it’s conceptually similar to an individual reading a library of books to become a writer and charge for the content they produce.

I would think a piracy community would be against platforms claiming ownership over user generated content at all.

permalink
report
parent
reply
1 point

https://youtu.be/9xJCzKdPyCo

This video can answer just about any question you ask. It’s long, but it’s split up into chapters so you can see what questions he’s answering in that chapter. I do recommend you watch the whole thing if you can. There’s a lot of information that I found very insightful and thought provoking

permalink
report
parent
reply
1 point
*

Couple things:

While I appreciate this gentleman’s copywrite experience, I do have a couple comments:

  • his analysis seems primarily focused from a law perspective. While I don’t doubt there is legal precedent for protection under copywrite law, my personal opinion is that copywrite is a capitalist conception that is dependent on an economic reality I fundamentally disagree with. Copywrite is meant to protect the livelihoods of artists, but I don’t think anyone’s livelihood should be dependent on having to sell labor. More often, copywrite is used to protect the financial interests of large businesses, not individual artists. The current litigation is between large media companies and OAI, and any settlement isn’t likely to remunerate much more than a couple dollars to individual artists, and we can’t turn back the clock to before AI could displace the jobs of artists, either.

  • I’m not a lawyer, but his legal argument is a little iffy to me… Unless I misunderstood something, he’s resting his case on a distinction between human inspiration (i.e. creative inspiration on derivative works) and how AI functions practically (i.e. AI has no subjective “experience” so it cannot bring its own “hand” to a derivative work). I don’t see this as a concrete argument, but even if I did, it is still no different than individual artists creating derivative works and crossing the line into copywrite infringement. I don’t see how this argument can be blanket applied to the use of AI, rather than individual cases of someone using AI on a project that draws too much from a derivative work.

The line is even less clear when discussing LLMs as opposed to T2I or I2I models, which I believe is what is being discussed in the lawsuit against OAI. Unlike images from DeviantArt and Instagram, text datasets from sources like reddit, Wikipedia, and Twitter aren’t protected under copywrite like visual media. The legal argument against the use of training data drawn from public sources is even less clear, and is even more removed to protecting the individual users and is instead a question of protecting social media sites with questionable legal claim to begin with. This is the point id expect this particular community would take issue with: I don’t think reddit or Twitter should be able to claim ownership over their user’s content, nor do I think anyone should be able to revoke consent over fair use just because it threatens our status quo capitalist system.

AI isn’t going away anytime soon, and litigating over the ownership of the training data is only going to serve to solidify the dominant hold over our economy by a handful of large tech giants. I would rather see large AI models be nationalized, or otherwise be protected from monopolization.

permalink
report
parent
reply

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

!piracy@lemmy.dbzer0.com

Create post
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don’t request invites, trade, sell, or self-promote

3. Don’t request or link to specific pirated titles, including DMs

4. Don’t submit low-quality posts, be entitled, or harass others


Loot, Pillage, & Plunder


💰 Please help cover server costs.


Community stats

  • 5.7K

    Monthly active users

  • 3.2K

    Posts

  • 84K

    Comments