You are viewing a single thread.
View all comments View context
28 points

Did we memory hole the whole ‘known CSAM in training data’ thing that happened a while back? When you’re vacuuming up the internet you’re going to wind up with the nasty stuff, too. Even if it’s not a pixel by pixel match of the photo it was trained on, there’s a non-zero chance that what it’s generating is based off actual CSAM. Which is really just laundering CSAM.

permalink
report
parent
reply
31 points

IIRC it was something like a fraction of a fraction of 1% that was CSAM, with the researchers identifying the images through their hashes but they weren’t actually available in the dataset because they had already been removed from the internet.

Still, you could make AI CSAM even if you were 100% sure that none of the training images included it since that’s what these models are made for - being able to combine concepts without needing to have seen them before. If you hold the AI’s hand enough with prompt engineering, textual inversion and img2img you can get it to generate pretty much anything. That’s the power and danger of these things.

permalink
report
parent
reply
-5 points

What % do you think was used to generate the CSAM, though? Like, if 1% of the images were cups it’s probably drawing on some of that to generate images of cups.

And yes, you could technically do this with no CSAM training material, but we don’t know if that’s what the AI is doing because the image sources used to train it were mass scraped from the internet. They’re using massive amounts of data without filtering it and are unable to say with certainty whether or not there is CSAM in the training material.

permalink
report
parent
reply
10 points

I didn’t know that, my bad.

permalink
report
parent
reply
7 points

Fair but depressing, it seems like it barely registered in the news cycle.

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 12K

    Posts

  • 553K

    Comments