224 points

Reminder that this is made by Ben Zhao, the University of Chicago professor who stole open source code for his last data poisoning scheme.

permalink
report
reply
67 points

Pardon my ignorance but how do you steal code if it’s open source?

permalink
report
parent
reply
225 points

You don’t follow the license that it was distributed under.

Commonly, if you use open source code in your project and that code is under a license that requires your project to be open source if you do that, but then you keep yours closed source.

permalink
report
parent
reply
2 points

I still wouldn’t call it stealing, but I guess “broke open source code licenses” doesn’t have the same impact, but I’d prefer accuracy.

permalink
report
parent
reply
78 points

He took GPLv3 code, which is a copyleft license that requires you share your source code and license your project under the same terms as the code you used. You also can’t distribute your project as a binary-only or proprietary software. When pressed, they only released the code for their front end, remaining in violation of GPLv3.

permalink
report
parent
reply
5 points

Probably the reason they’re moving to a Web offering. They could just take down the binary files and be gpl compliant, this whole thing is so stupid

permalink
report
parent
reply
25 points
*

And as I said there, it is utterly hypocritical for him to sell snake oil to artists, allegedly to help them fight copyright violations, while committing actual copyright violations.

permalink
report
parent
reply
4 points

That TOS would be sus under any other situation.

permalink
report
parent
reply
106 points

Is there a similar tool that will “poison” my personal tracked data? Like, I know I’m going to be tracked and have a profile built on me by nearly everywhere online. Is there a tool that I can use to muddy that profile so it doesn’t know if I’m a trans Brazilian pet store owner, a Nigerian bowling alley systems engineer, or a Beverly Hills sanitation worker who moonlights as a practice subject for budding proctologists?

permalink
report
reply
126 points

The only way to taint your behavioral data so that you don’t get lumped into a targetable cohort is to behave like a manic. As I’ve said in a past comment here, when you fill out forms, pretend your gender, race, and age is fluid. Also, pretend you’re nomadic. Then behave erratic as fuck when shopping online - pay for bibles, butt plugs, taxidermy, and PETA donations.

Your data will be absolute trash. You’ll also be miserable because you’re going to be visiting the Amazon drop off center with gag balls and porcelain Jesus figurines to return every week.

permalink
report
parent
reply
43 points

Then behave erratic as fuck when shopping online - pay for bibles, butt plugs, taxidermy, and PETA donations.

…in the same transaction. It all needs to be bought and then shipped together. Not only to fuck with the algorithm, but also to fuck with the delivery guy. Because we usually know what you ordered. Especially when it’s in the soft bag packaging. Might as well make everyone outside your personal circle think you’re a bit psychologically disturbed, just to be safe.

permalink
report
parent
reply
20 points

How? Aren’t most items in boxes even in the bags? It’s not like they just toss a butt plug into a bag and ship it…right?

permalink
report
parent
reply
8 points
Deleted by creator
permalink
report
parent
reply

Maybe have a look at using Adnauseam. 👌😁

permalink
report
parent
reply
34 points

The browser addon “AdNauseum” can help with that, although it’s not a complete solution.

permalink
report
parent
reply
27 points

That and trackmenot.

It searches random shit in the background.

https://www.trackmenot.io/

permalink
report
parent
reply
5 points

Yep these are the best tools for that currently.

permalink
report
parent
reply
3 points

That looks useful for these wanting to track MS rewards points…

…if you can get them, that is

permalink
report
parent
reply
2 points

Boy I sure wish they had these for iOS.

permalink
report
parent
reply
19 points

Is there a similar tool that will “poison” my personal tracked data? Like, I know I’m going to be tracked and have a profile built on me by nearly everywhere online. Is there a tool that I can use to muddy that profile so it doesn’t know if I’m a trans Brazilian pet store owner, a Nigerian bowling alley systems engineer, or a Beverly Hills sanitation worker who moonlights as a practice subject for budding proctologists?

Have you considered just being utterly incoherent, and not making sense as a person? That could work.

permalink
report
parent
reply
27 points

According to my exes, yes.

permalink
report
parent
reply
2 points

I have tricked the internet into thinking I speak Spanish/Portuguese… I really don’t. But figured when I get served ads, I may as well learn.

permalink
report
parent
reply
11 points

Mbyae try siunlhffg the mldide lterets of ervey wrod? I wnedor waht taht deos to a luaangge medol?

permalink
report
parent
reply
19 points

permalink
report
parent
reply
9 points

I guess it depends what your threat model is.

If you don’t like advertising, then you’re just piling a bunch of extra interests/demographics in there. It’ll remain roughly as valuable as it was before.

If you’re concerned about privacy and state actors, your activity would just increase. Anything that would trigger state interest would remain, so you’d presumably receive the same level of interest. Worse, if you aren’t currently of interest, there’s a possibility randomly generated traffic would be flagged by your adversary and increase their level of interest in you.

permalink
report
parent
reply
7 points

We all know you’re the last one.

permalink
report
parent
reply
3 points

or a Beverly Hills sanitation worker who moonlights as a practice subject for budding proctologists?

Yeah, it’s definitely the last one 😆

permalink
report
parent
reply
7 points

There are programs and plugins you can download that will open a bunch of random websites to throw off tracking programs.

permalink
report
parent
reply
3 points

Idk if this is what you’re looking for but might be worth taking a look

https://github.com/eth0izzle/Needl

“Your ISP is most likely tracking your browsing habits and selling them to marketing agencies (albeit anonymised). Or worse, making your browsing history available to law enforcement at the hint of a Subpoena. Needl will generate random Internet traffic in an attempt to conceal your legitimate traffic, essentially making your data the Needle in the haystack and thus harder to find. The goal is to make it harder for your ISP, government, etc to track your browsing history and habits.”

permalink
report
parent
reply
2 points
*

it never know if am really that bad at english

[.__.]

permalink
report
parent
reply
2 points
Deleted by creator
permalink
report
parent
reply
1 point

r/BrandNewSentence

permalink
report
parent
reply
90 points

The tool’s creators are seeking to make it so that AI model developers must pay artists to train on data from them that is uncorrupted.

That’s not something a technical solution will work for. We need copyright laws to be updated.

permalink
report
reply
25 points
4 points

Yeah, that’s what I’m saying - our current copiright laws are insufficient to deal with AI art generation.

permalink
report
parent
reply
11 points

They aren’t insufficient, they are working just fine. In the US, fair use balances the interests of copyright holders with the public’s right to access and use information. There are rights people can maintain over their work, and the rights they do not maintain have always been to the benefit of self-expression and discussion. We shouldn’t be trying to make that any worse.

permalink
report
parent
reply
7 points

The issue is simply reproduction of original works.

Plenty of people mimic the style of other artists. They do this by studying the style of the artist they intend to mimic. Why is it different when a machine does the same thing?

permalink
report
parent
reply
3 points

No, the issue is commercial use of copirighted material as data to train the models.

permalink
report
parent
reply
0 points

It’s different because a machine can be replicated and can produce results at a rate that hundreds of humans can’t match. If a human wants to replicate your art style, they have to invest a lot of time into learning art and practicing your style. A machine doesn’t have to do these things.

This would be fine if we weren’t living in a capitalist society, but since we do, this will only result in further transfer of assets towards the rich.

permalink
report
parent
reply
0 points

It’s not. People are just afraid of being replaced, especially when they weren’t that original or creative in the first place.

permalink
report
parent
reply
3 points

Honestly, it extends beyond creative works.

OpenAI should not be held back from subscribing to a research publication, or buying college textbooks, etc. As long as the original works are not reproduced and the underlying concepts are applied, there are no intellectual property issues. You can’t even say the commercial application of the text is the issue, because I can go to school and use my knowledge to start a company.

I understand that in some select scenarios, ChatGPT has been tricked into outputting training data. Seems to me they should focus on fixing that, as it would avoid IP issues moving forward.

permalink
report
parent
reply
0 points
*

AI image creation tools are apparently both artistically empty, incapable of creating anything artistically interesting, and also a existential threat to visual artists. Hmm, wonder what this says about the artistic merits of the work of furry porn commission artist #7302.

Retail workers can be replaced with self checkout, translators can be replaced with machine translation, auto workers can be replaced with robotic arms, specialist machinists can be replaced with CNC mills. But illustrators must be where we draw the line.

permalink
report
parent
reply
7 points

Disney lawyers just started salivating

permalink
report
parent
reply
11 points

Seems like Disney is as eager to adopt this technology as anyone

A few goofy Steamboat Willie knock offs pale beside the benefit of axing half your art department every few years, until everything is functionally a procedural generation.

permalink
report
parent
reply
9 points

They’re playing both sides. Who do you think wins when model training becomes prohibitively expensive to for regular people? Mega corporations already own datasets, and have the money to buy more. And that’s before they make users sign predatory ToS allowing them exclusive access to user data, effectively selling our own data back to us.

Regular people, who could have had access to a competitive, corporate-independent tool for creativity, education, entertainment, and social mobility, would instead be left worse off and with less than where we started.

permalink
report
parent
reply
0 points

copyright laws need to be abolished

permalink
report
parent
reply
26 points

That would make it harder for creative people to produce things and make money from it. Abolishing copyright isn’t the answer. We still need a system like that.

A shorter period of copyright, would encourage more new content. As creative industries could no longer rely on old outdated work.

permalink
report
parent
reply
-18 points

That would make it harder for creative people to produce things and make money from it

no, it would make it easier.

it would be harder to stop people from making money on creative works.

permalink
report
parent
reply
14 points

That would be an update, not sure it would be a good thing. As an artist I want to be able to tell where my work is used and where not. Would suck to find something from me used in fascist propaganda or something.

permalink
report
parent
reply
-13 points

As an artist I want to be able to tell where my work is used and where not.

that would be nice. a government-enforced monopoly isnt an ethical vehicle to achieve your goal.

permalink
report
parent
reply
3 points
*

Truly a “Which Way White Man” moment.

I’m old enough to remember people swearing left, right, and center that copyright and IP law being aggressively enforced against social media content has helped corner the market and destroy careers. I’m also well aware of how often images from DeviantArt and other public art venues have been scalped and misappropriated even outside the scope of modern generative AI. And how production houses have outsourced talent to digital sweatshops in the Pacific Rim, Sub-Saharan Africa, and Latin America, where you can pay pennies for professional reprints and adaptations.

It seems like the problem is bigger than just “Does AI art exist?” and “Can copyright laws be changed?” because the real root of the problem is the exploitation of artists generally speaking. When exploitation generates an enormous profit motive, what are artists to do?

permalink
report
parent
reply
1 point

What is a “which way white man” moment?

permalink
report
parent
reply
-1 points
*

They dutifully note that, this is the next best thing.

permalink
report
parent
reply
59 points

Explanation of how this works.

These “AI models” (meaning the free and open Stable Diffusion in particular) consist of different parts. The important parts here are the VAE and the actual “image maker” (U-Net).

A VAE (Variational AutoEncoder) is a kind of AI that can be used to compress data. In image generators, a VAE is used to compress the images. The actual image AI only works on the smaller, compressed image (the latent representation), which means it takes a less powerful computer (and uses less energy). It’s that which makes it possible to run Stable Diffusion at home.

This attack targets the VAE. The image is altered so that the latent representation is that of a very different image, but still roughly the same to humans. Say, you take images of a cat and of a dog. You put both of them through the VAE to get the latent representation. Now you alter the image of the cat until its latent representation is similar to that of the dog. You alter it only in small ways and use methods to check that it still looks similar for humans. So, what the actual image maker AI “sees” is very different from the image the human sees.

Obviously, this only works if you have access to the VAE used by the image generator. So, it only works against open source AI; basically only Stable Diffusion at this point. Companies that use a closed source VAE cannot be attacked in this way.


I guess it makes sense if your ideology is that information must be owned and everything should make money for someone. I guess some people see cyberpunk dystopia as a desirable future. I wonder if it bothers them that all the tools they used are free (EG the method to check if images are similar to humans).

It doesn’t seem to be a very effective attack but it may have some long-term PR effect. Training an AI costs a fair amount of money. People who give that away for free probably still have some ulterior motive, such as being liked. If instead you get the full hate of a few anarcho-capitalists that threaten digital vandalism, you may be deterred. Well, my two cents.

permalink
report
reply
20 points
*

So, it only works against open source AI; basically only Stable Diffusion at this point.

I very much doubt it even works against the multitude of VAEs out there. There’s not just the ones derived from StabilitiyAI’s models but ones right now simply intended to be faster (at a loss of quality): TAESD can also encode and has a completely different architecture thus is completely unlikely to be fooled by the same attack vector. That failing, you can use a simple affine transformation to convert between latent and rgb space (that’s what “latent2rgb” is) and compare outputs to know whether the big VAE model got fooled into generating something unrelated. That thing just doesn’t have any attack surface, there’s several magnitudes too few weights in there.

Which means that there’s an undefeatable way to detect that the VAE was defeated. Which means it’s only a matter of processing power until Nightshade is defeated, no human input needed. They’ll of course again train and try to fool the now hardened VAE, starting another round, ultimately achieving nothing but making the VAE harder and harder to defeat.

It’s like with Russia: They’ve already lost the war but they haven’t noticed, yet – though I wouldn’t be too sure that Nightshade devs themselves aren’t aware of that: What they’re doing is a powerful way to grift a lot of money from artists without a technical bone in their body.

permalink
report
parent
reply
4 points

Is dalle3 and midjourney using other methods?

permalink
report
parent
reply
8 points

Those companies don’t make the technical details public and I don’t follow the leaks and rumors. They almost certainly use, broadly, the same approach (latent diffusion). That is, their AIs work with a compressed version of the image to save on computing power.

permalink
report
parent
reply
3 points

o7 General Effort

permalink
report
parent
reply
2 points
*

Yeah. Not that it’s the fault of artists that capitalism exists in its current form. Their art is the fruit of their labor, and therefore, means should be taken to ensure that their labor is properly compensated. And I’m a marxist anarchist, no part of me agrees with any part of the capitalist system. But artists are effectively workers, and we enjoy the fruits of their labor. They are rarely fairly compensated for their work. In this particular instance, under the system we live in, artists rights should be prioritized over

I’m all for janky (getting less janky as time goes on) AI images, but I don’t understand why it’s so hard to ask artists permission first to use their data. We already maintain public domain image databases, and loads of artists have in the past allowed their art to be used freely for any purpose. How hard is it to gather a database of art who’s creators have agreed to let it be used for AI? All the time we’ve (the collective we) been arguing over thise could’ve been spent implementing a system to create such a database.

permalink
report
parent
reply
9 points

You should check out this article by Kit Walsh, a senior staff attorney at the EFF. The EFF is a digital rights group who recently won a historic case: border guards now need a warrant to search your phone. It should help clear some things up for you.

permalink
report
parent
reply
1 point

Fair enough, and I can’t claim to be a fan of copyright law or how it’s used. Maybe what I’m moreso talking about is a standard of ethics? Or some laws governing the usage of image and text generating AI specifically as opposed to copyright law. Like just straight up a law making it mandatory for AI to provide a list of all the data it used, as well as proof of the source of that data having consented to it’s use in training the AI.

permalink
report
parent
reply
5 points

That’s not quite right. A traditional worker is someone who operates machines, they don’t own, to make products, they don’t own. Artists, who are employed, do not own the copyrights to what they make. These employed artists are like workers, in that sense.

Copyrights are “intellectual property”. If one needed permission (mostly meaning, pay for it), then the money would go to the property owners. These worker-artists would not receive anything. Note that, on the whole, the owners already made what profit they could expect. Say, if it’s stills from a movie, then that movie already made a profit (or not).

People who use their own tools and own their own product (EG artisans in Marx’s time) are members of the Petite Bourgeoisie. I think a Marxist analysis of the class dynamics would be fruitful here, but it’s beyond me.

The spoilered bit is something I have written about the NYT lawsuit. I think it’s illuminating here, too.

spoiler

The NYT wants money for the use of its “intellectual property”. This is about money for property owners. When building rents go up, you wouldn’t expect construction workers to benefit, right?

In fact, more money for property owners means that workers lose out, because where else is the money going to come from? (well, “money”)

AI, like all previous forms of automation, allows us to produce more and better goods and services with the same amount of labor. On average, society becomes richer. Whether these gains go to the rich, or are more evenly distributed, is a choice that we, as a society, make. It’s a matter of law, not technology.

The NYT lawsuit is about sending these gains to the rich. The NYT has already made its money from its articles. The authors were paid, in full, and will not get any more money. Giving money to these property owners will not make society any richer. It just moves wealth to property owners for being property owners. It’s about more money for the rich.

If OpenAI has to pay these property owners for no additional labor, then it will eventually have to increase subscription fees to balance the cash flow. People, who pay a subscription, probably feel that it benefits them, whether they use it for creative writing, programming, or entertainment. They must feel that the benefit is worth, at least, that much in terms of money.

So, the subscription fees represent a part of the gains to society. If a part of these subscription fees is paid to property owners, who did not contribute anything, then that means that this part of the social gains is funneled to property owners, IE mainly the ultra-rich, simply for being owners/ultra-rich.


why it’s so hard to ask artists permission first to use their data.

SD was trained on images from the internet. Anything. There are screenshots, charts and pure text jpgs in there. There’s product images from shopping sites and also just ordinary snapshots that someone posted. The people with the biggest individual contribution are almost certainly professional photographers. SD is not built on what one usually calls art (with apologies to photographers). An influencer who has a lot of good, well tagged images on the net has made a more positive contribution than someone who makes abstract art or stick figure comics. And let’s not forget the labor of those who tagged those images.

You could not practically get permission from these tens or hundreds of millions of people. It would really be a shame, because the original SD reveals a lot about the stereotypes and biases on the net.

Using permissively licensed images wouldn’t have helped a lot. I have seen enough outrage over datasets with exactly such material. People say, that’s not what they had in mind when they gave these wide permissions.

Practically, look at wikimedia. There are so many images there which are “pirated”. Wikimedia can just take them down in response to a DMCA notice. Well, you can’t remove an image from a trained AI model. It’s not in there (if everything has worked). So what now? If that means that the model becomes illegal, then you just can’t have a model trained on such a database.

permalink
report
parent
reply
1 point
*

People who use their own tools and own their own product (EG artisans in Marx’s time) are members of the Petite Bourgeoisie. I think a Marxist analysis of the class dynamics would be fruitful here, but it’s beyond me.

Please don’t. Marxists, at least Marxist-Leninists, tend to start talking increasing amounts of nonsense once the Petite Bourgeoisie and Lumpen get involved.

In any case the whole thing is (as Marx would tell you, but Marxist ignore) a function of one’s societal relations, not of the individual person, or job. That relation might change from hour to hour (e.g. if you have a dayjob), and “does not have an employment contract” doesn’t imply “does not depend on capital for survival” – it’s perfectly possible as an artist, or pipe fitter, to own your own means of production (computer, metal tongs) and be, as a contractor, in a very similar relationship to capital as the Lumpen day-labourer: To have no say in the greater work that gets created, to be told “do this, or starve”, to be treated as an easily replaceable cog. That may even be the case if you have employees of your own. The question is, and that’s why Anarchist analysis >>> Marxist analysis, is whether you’re beholden to an unjust hierarchy, in this case, that created by capital ownership, not whether you happen to own a screw driver. As e.g. a farmer you might own millions upon millions in means of production, doesn’t mean that supermarket chains aren’t squeezing your bones dry and you can barely afford your utility bills. Capitalism is unjust hierarchy all the way up and down.

Well, you can’t remove an image from a trained AI model. It’s not in there (if everything has worked). So what now? If that means that the model becomes illegal, then you just can’t have a model trained on such a database.

I also can’t possibly unhear this, doesn’t mean that my mind or any music I might compose is illegal. If it is overfitted in my mind and I want to compose music and publish that then I’ll have to pay attention that my stuff is sufficiently different, have to run an adversarial model against myself, so to speak, if I don’t want to end up having to pay royalties. If I just want to have it bouncing around my head and sing it in the shower then I might be singing copyrighted material, but there’s no obligation for me to pay royalties either as many aspects of copyright necessitate things such as publishing or ability to damage the original author’s income.

permalink
report
parent
reply
56 points

Begun, the AI Wars have.

permalink
report
reply
23 points

Excited to see the guys that made Nightshade get sued in a Silicon Valley district court, because they’re something something mumble mumble intellectual property national security.

permalink
report
parent
reply
47 points

They already stole GPLv2 code for their last data poisoning scheme and remain in violation of that license. They’re just grifters.

permalink
report
parent
reply
13 points

sigh Grifts all the way down, it seems.

permalink
report
parent
reply
-9 points

If their targets don’t care about intellectual property, why should they?

permalink
report
parent
reply
6 points
*

I didn’t have that on my 2020 bingo card, but it has been a very long year so everything is possible.

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 11K

    Posts

  • 518K

    Comments