What Unicode characters (more precisely codepoints) are dangerous and should be blacklisted and prohibited for the users to use? I know that BIDI override characters and the "zero width space" are very prone to make problems, but what others are there?
Thanks
An 8 character unicode password is more secure than an 8 character ASCII password but less secure than a 64 character ASCII password.
The tallest Unicode character in the current standard (Unicode 6.1) is š» , U+1F5FB MOUNT FUJI , which is 3776 meters tall. Aren't there character code for other mountains? If we go this route... Depending on the definition of "tall", there's at least the Sun (ā, diameter 1e9 m) and a number of constellations (U+2648 ..
āšā U+1F480 Skull Unicode Character.
'HANGUL FILLER' (U+3164)
Since Unicode 1.1 in 1993, there is an empty wide, zero space character.
We can't see it, neither copy/paste it alone because we can't select it!
It need to be generated, by the unix keyboard shortcut: CTRL
+ SHIFT
+ u
+ 3164
It can pretty much š© up anything: variables, function name, url, file names, mimic DNS, invalidate hash strings, database entries, blog posts, logins, allow to fake identical accounts, etc.
DEMO 1: Altering variables
The variable hijacked contains a Hangul Filler char, the console log call the variable without the char:
const normal = "Hello w488ld"
const hijać
¤cked = "Hello w488ld"
console.log(normal)
console.log(hijacked)
DEMO 2: Hijack URL's
Those 3 url will lead to xn--stackoverflow-fr16ea.com
:
https://stackć ¤ć ¤overflow.com
https://stackć
¤ć
¤overflow.com
https://stackć ¤ć ¤overflow.com
A Golden Rule in security is to whitelist
instead of blacklist, instead of trying to cover all bad characters, it is a much better idea to validate based on ensuring the user only use known good characters.
There are solutions that help you build the large whitelist that is required for international whitelisting. For example, in .NET there is UnicodeCategory
.
The idea is that instead of whitelisting thousands of individual characters, the library assigns them into categories like alphanumeric characters, punctuations, control characters, and such.
Tutorial on whitelisting international characters in .NET
Unicode Regex: Categories
Characters arenāt dangerous: only inappropriate uses of them are.
You might consider reading things like:
It is impossible to guess what you mean by dangerous.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With