The ubiquity of audio commutation technologies, particularly telephone, radio, and TV, have had a significant affect on language. They further spread English around the world making it more accessible and more necessary for lower social and economic classes, they led to the blending of dialects and the death of some smaller regional dialects. They enabled the rapid adoption of new words and concepts.
How will LLMs affect language? Will they further cement English as the world’s dominant language or lead to the adoption of a new lingua franca? Will they be able to adapt to differences in dialects or will they force us to further consolidate how we speak? What about programming languages? Will the model best able to generate usable code determine what language or languages will be used in the future? Thoughts and beliefs generally follow language, at least on the social scale, how will LLM’s affects on language affect how we think and act? What we believe?
I figure they can either help or harm, depending on implementation:
Huggingface ( I always think of the “face-huggers” in Alien, when I see that name… and have NO idea why they thought that association would be a Good Thing™ ) has a LLM which apparently can do Sanskrit.
Consider, though:
All the Indigenous languages, where we’ve only actually got a partial-record of the language, and the “majority rule, minority extinguishes” “answer” of our normal process … obliterated all native speakers of that language ( partly through things like residential-schools, etc )…
now it becomes possible to have an LLM for that specific language, & to study the language, even though we’ve only got a piece of it.
This is like how we’ve sooo butchered the ecology that we can only study pieces of it, now, there’s simply too-much missing from what was there a few centuries ago, so we’re not looking at the origina/proper thing, either in ecologies or in languages.
sigh
This wasn’t supposed to be depressing.
Consider how search-engines have altered how we have to communicate…
In order to FORCE a search-engine to consider a pair-of-words to be a single-term, you have to remove all intervening space/hyphens/symbols from between them.
ClimatePunctuation is a single search-token, but “Climate Punctuation” is two separate, unrelated terms, which may or may-not appear in the results.
It’s obscene.
I’m almost mad-enough to want legislation forcing search-engines to respect some kind of standard set of defaults ( add more terms == narrowing the search, ie defaulting to Boolean AND, as one example ),
so they’d stop enshittifying our lives while “pretending” that they’re helping.
( there was a Science news site which would not permit narrowing-of-search, and I hope they fscking died.
Making search unusable on a science site??
probably some “charity” who pays most of their annual-budget to their administration, & only exists for their entitlement.
I’m saying that after having encountered that religion in charities. )
Interesting:
search-engines alter our use-of-language,
social-sites do too,
LLM’s do too,
marketing/propaganda does,
astroturfing does,
… it begins looking like real events are … rather-insignificant … influences in our languages?
Hm…
The word arafed will enter the common lexicon.
We’ll never ever start a phrase with “Certainly…” anymore
[shameless ad] This sort of question fits well !linguistics@mander.xyz [/shameless ad]
What causes the loss of a local variety (dialect or language) is not simply exposure to other varieties, but the loss of the identity associated with said variety. In other words, what led to the blending and death of those dialects wasn’t the audio communication technology - it’s economical, social, and ideological pressures, such as nationalism.
I’ll exemplify this using rhoticity in England. If telephone, radio and TV led to blending and death of dialects, you’d expect rhoticity in England to increase, due to exposure to American media. It didn’t - it’s decreasing:
Source for the map: it’s a collation of both maps in this article. The reason for the shift however becomes obvious when you look at identity matters: “you’re a Brit, speak like a Brit”.
The exact same reasoning applies to other languages, by the way. Caipira Portuguese features aren’t being replaced with the ones from that weird Globo TV accent, but with the ones spoken in São Paulo city; sheísmo in Argentina seems to be spreading, regardless of media from other countries; Occitan was not killed in France by simply exposing kids to French, but by making them feel ashamed of speaking Occitan.
With that out of the way, it’s hard to predict the future impact of machine text generation, be it through LLMs or better models. It’s perfectly possible that this sort of tech helps the preservation of local varieties, as LLMs are kind of good at translation; for example, I’ve noticed that Gemini is able to parse Venetian, even if unable to answer in the language.
Wait, who says it is the dominant branch of “AI development”? What does that mean, in fact? Who says it was telephone and radio that led to English hegemony? Who said thoughts and beliefs follow language? Most of the people I know in related fields seem to think that’s widely disproved. I mean, no bad questions, but there’s a TON of built-in assumptions in the OP and not all of them check out.
FWIW, I don’t know that generated language gets to change much if it’s generated by inferring likely language from human sources. At most there may be a newfound premium on using original, spontaneous sounding language in writing just to prove one’s humanity by distinguishing from bland, generated language, but I suppose even that depends on how the tech moves forward.