Which of the following sounds more reasonable?

  • I shouldn’t have to pay for the content that I use to tune my LLM model and algorithm.

  • We shouldn’t have to pay for the content we use to train and teach an AI.

By calling it AI, the corporations are able to advocate for a position that’s blatantly pro corporate and anti writer/artist, and trick people into supporting it under the guise of a technological development.

99 points

I think it’s the same reason the CEO’s of these corporations are clamoring about their own products being doomsday devices: it gives them massive power over crafting regulatory policy, thus letting them make sure it’s favorable to their business interests.

Even more frustrating when you realize, and feel free to correct me if I’m wrong, these new “AI” programs and LLMs aren’t really novel in terms of theoretical approach: the real revolution is the amount of computing power and data to throw at them.

permalink
report
reply
59 points

The funniest thing I’ve seen on this is the ChatGPT CEO, Altman, talking about how he’s a bit afraid of what they’ve created and how it needs limitations – and then when the EU begins to look at regulations, he immediately rejects the concept, to the point of threatening to leave the European market. It’s incredibly transparent what they’re doing.

Unfortunately I don’t know enough about the technology to say if the algorithms and concepts themselves are novel, but without a doubt they couldn’t exist without modern computing power capabilities.

permalink
report
parent
reply
20 points
*

I can tell for a fact that there’s nothing new going on. Only the MASSIVE investment from Microsoft to allow them to train on an insane amount of data. I am no “expert” per se, but I’ve been studying and working with AI for over a decade - so feel free to judge my reply as you please

permalink
report
parent
reply
-2 points

nothing new going on

Uhhhh the available models are improving by leaps and bounds by the month, and there’s quite a bit of tangible advancement happening every week. Even more critically the models that can be run on a single computer are very quickly catching up to those that just a year or two ago required some percentage of a hyperscaler’s datacenter to operate

Unless you mean to say that the current insane pace of advancement is all built off of decades of research and a lot of the specific advancements recently happen to be fairly small innovations into previous research infused with a crapload of cash and hype (far more than most researchers could only dream of)

permalink
report
parent
reply
-3 points

nothing new going on

I can’t think of anything less accurate to say about LLMs other than that they’re a world-ending threat.

This is a bit like saying “The internet is a cute thing for tech nerds but will never go mainstream” in like 1995.

permalink
report
parent
reply
11 points

The concepts themselves are some 30 years old, but storage capacity and processing speed have only recently reached a point where generative AI outperforms competing solutions.

But regarding the regulation thing, I don’t know what was said or proposed, and this is just me playing devil’s advocate: but could it be that the CEO simply doesn’t agree with the specifics of the proposed regulations while still believing that some other, different kind of regulation should exist?

permalink
report
parent
reply
15 points

Certainly could be, but probably an optimistic take. Most likely they’re just trying to do what corporations have been doing for ages, which is to weaponize government policy to prevent competition. They don’t want restrictions that will materially impact their product, they want restrictions that will materially impact startups to make it more difficult for them to intrude on the established space.

permalink
report
parent
reply
-3 points

And what are they doing? To remind, OpenAI is non-profit.

permalink
report
parent
reply
6 points

I thought they moved to for profit back in 2019?

permalink
report
parent
reply
24 points

Even more frustrating when you realize, and feel free to correct me if I’m wrong, these new “AI” programs and LLMs aren’t really novel in terms of theoretical approach: the real revolution is the amount of computing power and data to throw at them.

This is 100% true. LLMs, neural networks, markov chains, gradient descent, etc. etc. on down the line is nothing particularly new. They’ve collectively been studied academically for 30+ years. It’s only recently that we’ve been able to throw huge amounts of data, computing capacity, and time to tweak said models to achieve results unthinkable 10-ish years ago.

There have been efficiencies, breakthroughs, tweaks, and changes over this time too, but that’s just to be expected. But largely its just sheer raw size/scale that’s just been achievable recently.

permalink
report
parent
reply
10 points

We all remember SmarterChild…right?

permalink
report
parent
reply
5 points

No, I have clearly forgotten: What was that?

permalink
report
parent
reply
3 points

I do now!

permalink
report
parent
reply
2 points

Oh flashbacks there. Completely forgot about this

permalink
report
parent
reply
2 points

I remember Tay

permalink
report
parent
reply
7 points

LLMs aren’t really novel in terms of theoretical approach: the real revolution is the amount of computing power and data to throw at them.

This is 100% true. LLMs, neural networks, markov chains, gradient descent, etc. etc. on down the line is nothing particularly new. They’ve collectively been studied academically for 30+ years.

Well LLMs and particularly GPT and its competitors rely on Transformers, which is a relatively recent theoretical development in the machine learning field. Of course it’s based in prior research, and maybe there even is prior art buried in some obscure paper or 404 link, but if that’s your measure then there is no “novel theoretical approach” for anything, ever.

I mean I’ll grant that the available input data and compute for machine learning has increased exponentially, and that’s certainly an obvious factor in the improved output quality. But that’s not all there is to the current “AI” summer, general scientific progress played a non-minor part as well.

In summary, I disagree on data/compute scale being the deciding factor here, it’s deep learning architecture IMHO. The former didn’t change that much over the last half decade, the latter did.

permalink
report
parent
reply
3 points

Now as I stated in my first comment in these threads, I don’t know terribly much about the technical details behind current LLM’s and I’m basing my comments on my layman’s reading.

Could you elaborate on what you mean about the development of of deep learning architecture in recent years? I’m curious; I’m not trying to be argumentative.

permalink
report
parent
reply
5 points

Okay, I’m glad I’m not too far off the mark then (I’m not an AI expert/it’s not my field of study).

I think this also points to/is a great example of another worrying trend: the consolidation of computing power in the hands of a few large companies. Without even factoring in the development of true AI/whether that can or will happen anytime soon, the LLMs really show off the massive scale of both computational power consolidation AMD data harvesting by only a very few entities. I’m guessing I’m not alone here in finding that increasingly concerning, particularly since a lot of development is driving towards surveillance applications.

permalink
report
parent
reply
3 points

by that logic there was nothing novel about solid state transistors since they just did the same thing as vacuum tubes; no innovation there I guess. No new ideas came from finally having a way to pack cooler, less power hungry, smaller components together.

permalink
report
parent
reply
8 points

LLMs are pretty novel. They are made possible by invention of the Transformer model, that operates significantly different compared to, say, RNN.

permalink
report
parent
reply
6 points

It also plays into the hype cycle they’re trying to create. Saying you’ve made an AI is more likely to capture the attention of the masses then saying you have a LLM. Ditto that point for the existential doomerism that they ceo’s have. Saying your tech is so powerful that it might lead to humanity’s extinction does wonders in building hype.

permalink
report
parent
reply
4 points

Agreed. And all you really need to do is browse any of the headlines from even respectable news outlets to see how well it’s working. It’s just article after article uncritically parroting whatever claims these CEO’s make at face value at least 50% of the time. It’s mind-numbing.

permalink
report
parent
reply
3 points

The fear mongering is pretty ridiculous.

“AI could DESTROY HUMANITY. It’s like the ATOMIC BOMB! Look at it’s RAW POWER!”

AI generates an image of cats playing canasta.

“By God…”

permalink
report
parent
reply
0 points

We could say that the human brain isn’t novel in terms of biological composition: the real evolution is the size increase compared to the body.

The fact that insects exist doesn’t make us less intelligent.

But I agree with the sentiment of the argument.

permalink
report
parent
reply
34 points

IMO content created by either AI or LLMs should have a special license and be considered AI public domain (unless they can prove that they own all content the AI was trained on). Commercial content made based on content marked with this license would be subject to a flat % tax that should be applied to the product price which would be earmarked for a fund distributing to human creators (coders, writers, musicians etc.).

permalink
report
reply
12 points

I think the cleaner (and most likely) outcome is AI generated work is considered public domain, and since public domain content can already be edited and combined and arranged to create copyrighted content this would largely clear up the path for creators to use AI more prominently in their workflows

permalink
report
parent
reply
1 point

So I can make derivative works from commercial works, make something from that material, then release the result as public domain? I would think not.

permalink
report
parent
reply
1 point

Honestly, I’d personally prefer the latter, but there is the argument made by artists, coders and content creators. Their work is being scraped to train these AI’s, which in turn makes their future work less valuable. Hence, the thought of enforcing a tiny “royalty”/tax on commercial products based off of AI generated content and funneling that money back to human creators of intellectual works.

permalink
report
parent
reply
2 points

What about LLM generated content that was then edited by a human? Surely authors shouldn’t lose copyright over an entire book just because they enlisted the help of LLMs for the first draft.

permalink
report
parent
reply
1 point

If you take open source code using GNU GPL and modify it, it retains the GNU GPL license. It’s like saying it’s fine to take a book and just change some words and it’s totally not plagerism.

permalink
report
parent
reply
2 points

Public domain is not infectious like GPL is. That being said, it seems like the parent comment has already mentioned this case, now that I’ve read them again:

public domain content can already be edited and combined and arranged to create copyrighted content

That’s fine by me. The important thing is that humans can still use AI as a legally recognized productivity tool, including using it as a way to use ideas and styles generated by other humans.

permalink
report
parent
reply
32 points

both sound the same to me IMO. Private companies scraping ostensibly public data to sell it. No matter how you word it they are trying to monetize stuff that is out in the open.

permalink
report
reply
7 points
*

I don’t see why a single human should be able to profit off learning from others but a group of humans doing it for a company cannot. This is just how humanity advances at whatever scale.

permalink
report
parent
reply
9 points

I had a comment about the morality of it at first but I pulled it out. This is not an easy question to answer. Corporations gate keeping knowledge seems weird and dystopian but the knowledge is out there and they are just making connections between it. It also touches on copyright and fair use.

permalink
report
parent
reply
3 points

I agree it’s much more complicated an issue than most people give it credit.

permalink
report
parent
reply
25 points

I see it like this:

Our legal system has the concept of mechanical licensing. If your song exists, someone can demand the right to cover it and the law will favor them. The result of an LLM has less to do with your art that a cover of your song does.

There are plenty of cases of a cover eclipsing the original version of a song in popularity and yet I have never met a single person argue that we should get rid of the right to cover a song.

permalink
report
reply
15 points

Sure, you have the legal right to cover someone else’s song without asking permission first, but you still have to pay them royalties afterwards, at fair market rates.

permalink
report
parent
reply
25 points

I’m not sure what you’re trying to say here; LLMs are absolutely under the umbrella of AI, they are 100% a form of AI. They are not AGI/STRONG AI, but they are absolutely a form of AI. There’s no “reframing” necessary.

No matter how you frame it, though, there’s always going to be a battle between the entities that want to use a large amount of data for profit (corporations) and the people who produce said content.

permalink
report
reply
8 points

True, and this is the annoying thing about people unqualified to talk about AI giving their opinions online. People not involved in the industry hear “AI” and expect HAL-9000 or Ava from Ex Machina rather than the software that the weather service uses to predict if it will rain tomorrow, or the models your doctor uses to help determine your risk of Heart Disease.

This is compounded further when someone makes a video simplifying what an LLM is and mentioning that the latest models use it, which leads to the chimes of “bUt iT’S jUsT aN Llm BrO iTs nOt AI” and “ItS jUsT a LOaD oF DaTa aND aLGorItHMs, tHaTs NoT AI”. A little bit of knowledge is a dangerous thing.

permalink
report
parent
reply
3 points

or that people are only exposed to trivial/childish publicly available examples.

permalink
report
parent
reply
0 points

This is actually exactly what I mean. Most people hear AI and envision something much, much more complex. It’s easier to argue that HAL-9000 is like a human and should therefore be allowed to freely view book content like a human, versus argue that a sophisticated LLM is like a human and should be allowed to freely view books like a human. That’s moreso where I’m coming from. And politicians are stupid enough to pass laws envisioning these as HAL-9000.

permalink
report
parent
reply
7 points
*

On the flip side, the same battle is also fought between giant corporations that amass intellectual property and the people who want to actually use that intellectual property instead of letting it sit in some patent troll’s hoard until a lawsuit op presents itself. Seeing as there are quite a few reasonably decent open-source LLMs out there like Koala and Alpaca also training on data freely available on the Internet, I’m actually rooting for the AI companies in this case, in the hopes of establishing a disruptive precedent.

permalink
report
parent
reply
3 points

Right, where I’m coming from is that I don’t think the personhood arguments you see for why content should be free for it really hold any water. Whatever the case on its intelligence, it isn’t comparable to humans for copyright law

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 11K

    Posts

  • 517K

    Comments