Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is unicode-based XSS an issue?

Maybe this is better for security.stack, I'm not sure, but here's the question:

I recently came across a blog claiming that <script>alert(1)</script> will get parsed into an actual <script>. However, in my tests on recent Chrome, this is not the case. Has anyone ever heard of a browser parsing it as real markup? If so, then I have no idea how one would mitigate against it, since presumably there are others and not just '<' to worry about, and I know that I'm not up for going through all of unicode to enumerate them.

like image 575
wwaawaw Avatar asked Dec 09 '22 20:12

wwaawaw


2 Answers

That would be in direct violation of HTML specifications. By them, the markup-significant characters are Ascii characters, whereas characters like U+FF1C FULLWIDTH LESS-THAN SIGN “<” are just data characters with no special significance. Browsers would need extra code to map fullwidth characters to Ascii (either as an ad hoc mapping or e.g. via normalization to NFKD or NFCKC), but there’s no reason to assume they would do such things, any more than there is a reason to think that they could start mapping “[” to “<”.

So a blog that claims otherwise is just describing a possibility that someone invented but has no real grounds. You can usually see this from the references and demonstrations given. (That is, from the absence of them.)

There are surely security issues around Unicode characters that look similar to each other, but then it’s a matter of human beings mistakenly taking one character for another even though they are internally quite different, like “<” for “<” (and therefore e.g. seeing a string in HTML source as a script element even though it isn’t) or “а” for “a” (a Cyrillic letter for a Latin letter with identical appearance). That is, people may see characters as identical even though programs see them as distinct.

like image 149
Jukka K. Korpela Avatar answered Jan 31 '23 05:01

Jukka K. Korpela


No, a browser will not interpret text surrounded by fullwidth LT or GT signs as valid HTML tags, but certain backends will transform them into normal LT or GT signs, creating an XSS risk. See the following: http://websec.github.io/unicode-security-guide/character-transformations/#best-fit

like image 24
Brian H Avatar answered Jan 31 '23 05:01

Brian H