Basically, it’s a calculator that can take letters, numbers, words, sentences, and so on as input.
And produce a mathematically “correct” sounding output, defined by language patterns in the training data.
This core concept is in most if not all “AI” models, not just LLMs, I think.
mathematically “correct” sounding output
It’s hard to say because that’s a rather ambiguous way of describing it (“correct” could mean anything), but it is a valid way of describing its mechanisms.
“Correct” in the context of LLMs would be a token that is likely to follow the preceding sequence of tokens. In fact, it computes a probability for every possible token, then takes a random sample according to that distribution* to choose the next token, and it repeats that until some termination condition. This is what we call maximum likelihood estimation (MLE) in machine learning (ML). We’re learning a distribution that makes the training data as likely as possible. MLE is indeed the basis of a lot of ML, but not all.
*Oversimplification.