Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What Unicode characters are dangerous?

What Unicode characters (more precisely codepoints) are dangerous and should be blacklisted and prohibited for the users to use? I know that BIDI override characters and the "zero width space" are very prone to make problems, but what others are there?

Thanks

like image 379
federico-t Avatar asked Nov 04 '11 01:11

federico-t


People also ask

Are Unicode passwords safe?

An 8 character unicode password is more secure than an 8 character ASCII password but less secure than a 64 character ASCII password.

What is the tallest Unicode character?

The tallest Unicode character in the current standard (Unicode 6.1) is šŸ—» , U+1F5FB MOUNT FUJI , which is 3776 meters tall. Aren't there character code for other mountains? If we go this route... Depending on the definition of "tall", there's at least the Sun (ā˜€, diameter 1e9 m) and a number of constellations (U+2648 ..

What is the Unicode for a skull?

ā€œšŸ’€ā€ U+1F480 Skull Unicode Character.


3 Answers

'HANGUL FILLER' (U+3164)

Since Unicode 1.1 in 1993, there is an empty wide, zero space character.

We can't see it, neither copy/paste it alone because we can't select it!

It need to be generated, by the unix keyboard shortcut: CTRL + SHIFT + u + 3164

It can pretty much šŸ’© up anything: variables, function name, url, file names, mimic DNS, invalidate hash strings, database entries, blog posts, logins, allow to fake identical accounts, etc.


DEMO 1: Altering variables

The variable hijacked contains a Hangul Filler char, the console log call the variable without the char:

const normal = "Hello w488ld"
const hija慤cked = "Hello w488ld"
console.log(normal)
console.log(hijacked)

DEMO 2: Hijack URL's

Those 3 url will lead to xn--stackoverflow-fr16ea.com:

https://stack慤慤overflow.com

https://stack慤慤overflow.com

https://stack慤慤overflow.com

like image 156
NVRM Avatar answered Oct 17 '22 19:10

NVRM


A Golden Rule in security is to whitelist instead of blacklist, instead of trying to cover all bad characters, it is a much better idea to validate based on ensuring the user only use known good characters.

There are solutions that help you build the large whitelist that is required for international whitelisting. For example, in .NET there is UnicodeCategory.

The idea is that instead of whitelisting thousands of individual characters, the library assigns them into categories like alphanumeric characters, punctuations, control characters, and such.

Tutorial on whitelisting international characters in .NET

Unicode Regex: Categories

like image 36
Desmond Zhou Avatar answered Oct 17 '22 18:10

Desmond Zhou


Characters arenā€™t dangerous: only inappropriate uses of them are.

You might consider reading things like:

  • Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax
  • RFC 3454: Preparation of Internationalized Strings (ā€œstringprepā€)

It is impossible to guess what you mean by dangerous.

like image 5
tchrist Avatar answered Oct 17 '22 20:10

tchrist