125 points

🙃 compression algorithms hate this one simple trick!!

permalink
report
reply
85 points

This is a joke, right? This feels like a very dumb solution. I don’t know much about UTF-8 encoding, but it sounds like Roman characters can be encoded shorter than most or all others because of a shorthand that assumes Roman characters. In that case, why not take that functionality and let a UTF-8 block specify which language makes up most of the text so that you can have that savings almost every time? I don’t see why one would want it to be random.

permalink
report
reply
127 points
*

It’s a joke.

UTF-16 already exists, which doesn’t favor Roman characters as much, but UTF-8 is more popular because it is backword compatible with the legacy ASCII.

UTF-32 also exists which has exactly equal length representation for every character.

But the thing that equalizes languages is compression.

Yes, a text written in Cyrillic with UTF-8 will take more space than a Roman language, easily double. However this extra space is much more easily compressed by an algorithm like GZIP.

So after compression, the two compressed texts will then be similarly sized and much smaller than UTF-16 or UTF-32.

permalink
report
parent
reply
19 points

Besides most text on the average computer is either within some configuration file (which tend to use latin script), or within some SGML derived format which has a bunch of latin characters in it. For network transmission most things will use HTML, XML or JSON and use English language property names even in countries that don’t speak English (see Yandex’s and Baidu’s APIs for example).

No one is moving large amounts of .txt files around.

permalink
report
parent
reply
27 points

You’ve never worked in finance then. All our systems at work do nothing but move large amounts of txt files around.

That said, many of our clients still don’t support utf-8 so its all ascii and non-latin alphabets are screwed. They can’t even handle characters 128-255 so even stuff like £ is unsupported.

permalink
report
parent
reply
1 point
Deleted by creator
permalink
report
parent
reply
42 points
*
Deleted by creator
permalink
report
reply
42 points

It’ll be added when they’d find some free time!

You see, adding pictures women with white cane facing right, limes and pregnant men is a very important and time consuming job! Standardizing encoding for some human language people use is just not as important!

permalink
report
parent
reply
9 points
*

These are emojis, not unicode right?

Edit: well, TIL.

permalink
report
parent
reply
43 points

Emoji are defined as part of Unicode, so they can be encoded alongside other text:

https://unicode.org/emoji/charts/full-emoji-list.html

permalink
report
parent
reply
23 points

Emoji are part of unicode. And people demand more of them, so it’s no surprise they put effort into those, even if OP thinks they are not important.Few people appreciate the unicode consortium for their originally intended work.

permalink
report
parent
reply
3 points

Random transphobia mixed in amongst a good point.

permalink
report
parent
reply
21 points

Oh please share, what character set?

permalink
report
parent
reply
45 points
*
Deleted by creator
permalink
report
parent
reply
14 points
*

I was not expecting the drama around it. Is the issue truly a different orthography or is more like a different font/ligature issue?

EDIT: forgot the article I found on it: https://restofworld.org/2021/tulu-unicode-script/

permalink
report
parent
reply
7 points

What language is that?

permalink
report
parent
reply
16 points
*
Deleted by creator
permalink
report
parent
reply
21 points

I immediately thought of Leeroy Jenkins in the last sentence.

https://youtu.be/mLyOj_QD4a4?si=6RhZzj8LO3tr80cT

permalink
report
reply
2 points

Pretty certain it’s an intentional reference.

permalink
report
parent
reply
2 points

You’re right, and someone else might be a part of the lucky 10,000 today.

permalink
report
parent
reply
1 point

And now we have the obligatory xkcd reference. 😁

permalink
report
parent
reply
19 points

I can’t read “what a time to be alive” without hearing Two Minute Papers in my head

permalink
report
reply
4 points

hold onto your papers

permalink
report
parent
reply

Programmer Humor

!programmerhumor@lemmy.ml

Create post

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

  • Posts must be relevant to programming, programmers, or computer science.
  • No NSFW content.
  • Jokes must be in good taste. No hate speech, bigotry, etc.

Community stats

  • 3.6K

    Monthly active users

  • 1.5K

    Posts

  • 35K

    Comments