Here is the text of the NIST sp800-63b Digital Identity Guidelines.
Who cares? Itâs going to be hashed anyway. If the same user can generate the same input, it will result in the same hash. If another user canât generate the same input, well, thatâs really rather the point. And I canât think of a single backend, language, or framework that doesnât treat a single Unicode character as one character. Byte length of the character is irrelevant as long as youâre not doing something ridiculous like intentionally parsing your input in binary and blithely assuming that every character must be 8 bits in length.
It matters for bcrypt/scrypt. They have a 72 byte limit. Not characters, bytes.
That said, I also think it doesnât matter much. Reasonable length passphrases that could be covered by the old Latin-1 charset can easily fit in that. If youâre talking about KJC languages, then each character is actually a whole word, and youâre packing a lot of entropy into one character. 72 bytes is already beyond whatâs needed for security; itâs diminishing returns at that point.
If the same user can generate the same input, it will result in the same hash.
Yes, if. I donât know if you can guarantee that. Itâs all fun and games as long as youâre doing English. In other languages, you get characters that can be encoded in more than 1 way. User at home has a localized keyboard with a dedicated key for such a character. User travels across the border and has a different language keyboard and uses a different way to create the character. Euro problems.
https://en.wikipedia.org/wiki/Unicode_equivalence
Byte length of the character is irrelevant as long as youâre not doing something ridiculous like intentionally parsing your input in binary and blithely assuming that every character must be 8 bits in length.
There is always some son-of-a-bitch who doesnât get the word.
- John F. Kennedy