Why do most (all?) websites only support usernames in ASCII? Are there any security considerations if an admin decides to start accepting Unicode usernames?
Usernames can contain letters (a-z), numbers (0-9), and periods (.). Usernames cannot contain an ampersand (&), equals sign (=), underscore (_), apostrophe ('), dash (-), plus sign (+), comma (,), brackets (<,>), or more than one period (.)
Many web based user authentication systems don't allow usernames that contain characters other than letters, numbers and underscores.
It is said that in UNIX and other alike systems, we can use hyphen for username in addition to Latin characters, numerics and underscore. On the other hand, the hyphen character is used as operator in so many programming languages.
It is 2018 and Google's Authentication API now supports unicode passwords.
Homoglyph attacks. User 'cat' and 'сat' are different unicode strings although they look the same. The first letter in the second 'сat' is Russian 'с' - "CYRILLIC SMALL LETTER ES" to be exact. The system can't easily tell that you're spoofing another user's name - to the computer the nicks are different.
Edit: Preventing mixed scripts does not solve the problem. For example 'сосо' is pure Cyryllic and can be used to spoof ascii 'coco'.
Also, left-to-right override (and friends.) Leave them unsanitized and they'll mess up your whole page.
HTTP authentication? There could be some problems with sending the unicode username (and/or password) over existing protocols. One case that I have run into before is with Basic authentication. There is no well defined way to handle sending these unicode usernames/passwords in the basic auth headers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With