Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Really Good, Bad UTF-8 example test data [closed]

So we have the XSS cheat sheet to test our XSS filtering - but other than an example benign page I can't find any evil or malformed test data to make sure that my UTF-8 code can handle missbehaving data.

Where can I find some good uh.. bad data to test with? Or what is a tricky sequence of chars?

like image 842
Xeoncross Avatar asked Aug 23 '09 17:08

Xeoncross


People also ask

What is an invalid UTF-8 character?

This error is created when the uploaded file is not in a UTF-8 format. UTF-8 is the dominant character encoding format on the World Wide Web. This error occurs because the software you are using saves the file in a different type of encoding, such as ISO-8859, instead of UTF-8.

What characters are not included in UTF-8?

0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units. A UTF-8 code unit is 8 bits. If by char you mean an 8-bit byte, then the invalid UTF-8 code units would be char values that do not appear in UTF-8 encoded text.

What is UTF-8 an example of?

UTF-8 is a Unicode character encoding method. This means that UTF-8 takes the code point for a given Unicode character and translates it into a string of binary.

What is a valid UTF-8?

A valid UTF-8 character can be 1 - 4 bytes long. For a 1-byte character, the first bit is a 0 , followed by its unicode. For an n-bytes character, the first n-bits are all ones, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10 .


1 Answers

Check out Markus Kuhn’s UTF-8 decoder stress test

like image 134
zildjohn01 Avatar answered Sep 20 '22 18:09

zildjohn01