In its submission to the Australian governmentâs review of the regulatory framework around AI, Google said that copyright law should be altered to allow for generative AI systems to scrape the internet.
Except when it produces exact copies of existing works, or when it includes a recognisable signature or watermark?
The point is that if the model doesnât contain any recognisable parts of the original material it was trained on, how can it reproduce recognisable parts of the original material it was trained on?
Thatâs sorta the point of it.
I can recreate the phrase âapple pieâ in any number of styles and fonts using my hands and a writing tool. Would you say that I âcontainâ the phrase âapple pieâ? Where is the letter âpâ in my brain?
Specifically, the AI contains the relationship between sets of words, and sets of relationships between lines, contrasts and colors.
From there, it knows how to take a set of words, and make an image that proportionally replicates those line pattern and color relationships.
You can probably replicate the Getty images watermark close enough for it to be recognizable, but you donât contain a copy of it in the sense that people typically mean.
Likewise, because you can recognize the artist who produced a piece, you contain an awareness of that same relationship between color, contrast and line that the AI does. I could show you a Picasso you were unfamiliar with, and youâd likely know it was him based on the style.
Youâve been âtrainedâ on his works, so you have internalized many of the key markers of his style. That doesnât mean you âcontainâ his works.
Ah, this old paper again. When it first came out it got raked over the coals pretty thoroughly. The authors used an older, poorly-trained version of Stable Diffusion that had been trained on only 160 million images and identified 350,000 images from the training set that had many duplicates and therefore could potentially be overfitted. They then generated 175 million images using tags commonly associated with those duplicate images.
After all that, they found 109 images in the output that looked like fuzzy versions of the input images. This is hardly a triumph of plagiarism.
As for the watermark, look closely at it. The AI clearly just replicated the idea of a Getty-like watermark, itâs barely legible. What else would you expect when you train an AI on millions of images that contain a common feature, though? Itâs like any other common object - it thinks photographs often just naturally have a grey rectangle with those white squiggles in it, and so it tries putting them in there when it generates photographs.
These are extreme stretches and they get dredged up every time by AI opponents. Training techniques have been refined over time to reduce overfitting (since whatâs the point in spending enormous amounts of GPU power to produce a badly-artefacted copy of an image you already have?) so itâs little wonder there arenât any newer, better papers showing problems like these.
Nevertheless, the Getty watermark is a recognisable element from the images the model was trained on, therefore you cannot state that the models donât spit out images with recognisable elements from the training data.
Take a close look at the âwatermarkâ on the AI-generated image. Itâs so badly mangled that you wouldnât have a clue what it says if you didnât already know what it was âsupposedâ to say. If thatâs really something youâd consider âcopyrightableâ then the whole worldâs in violation.
The only reason this is coming up in a copyright lawsuit is because Getty is using it as evidence that Stability AI used Getty images in the training set, not that theyâre alleging the AI is producing copyrighted images.