You are viewing a single thread.
View all comments View context
8 points

An inherent flaw in transformer architecture (what all LLMs use under the hood) is the quadratic memory cost to context. The model needs 4 times as much memory to remember its last 1000 output tokens as it needed to remember the last 500. When coding anything complex, the amount of code one has to consider quickly grows beyond these limits. At least, if you want it to work.

This is a fundamental flaw with transformer - based LLMs, an inherent limit on the complexity of task they can ‘understand’. It isn’t feasible to just keep throwing memory at the problem, a fundamental change in the underlying model structure is required. This is a subject of intense research, but nothing has emerged yet.

Transformers themselves were old hat and well studied long before these models broke into the mainstream with DallE and ChatGPT.

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 12K

    Posts

  • 528K

    Comments