Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What makes a good test string for testing web forms for unicode compatibility?

Tags:

unicode

xss

What test text do you try and type into your web forms to check that they handle all the edge cases properly (especially Unicode and XSS style problems).

I am particularly interested in good Unicode strings that may do something odd if they are mis-encoded when they are displayed again.

Text that contains potentially problematic characters, like quotes, <, > etc would also be interesting.

like image 890
Rik Heywood Avatar asked Aug 27 '09 19:08

Rik Heywood


People also ask

What is a Unicode string?

Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.

How to test Unicode characters?

To test if a program is fully Unicode compliant, write text mixing different languages in different directions and characters with diacritics, especially in Persian characters. Try also decomposed characters, for example: {e, U+0301} (decomposed form of é, U+00E9).

How do I check if a string contains Unicode?

If you want to check whether the string contains UniCode character or not, for that you have to check Character value and compare with 127, if it is greater than 127 it is not ASCII.


2 Answers

Your idea of HTML-sensitive characters is a good start. I also like using characters that are kind of readable, but are still Unicode. When I was doing this kind of testing for tabblo.com, I used this string:

Testing «ταБЬℓσ»: 1<2 & 4+1>3, now 20% off!

This has HTML-sensitive characters, ASCII, upper-half ISO characters, and multi-byte Unicode characters.

like image 184
Ned Batchelder Avatar answered Oct 20 '22 19:10

Ned Batchelder


Turkey testing!

http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html

This is actually pretty advanced internationalization testing, not for the faint of heart, including date formatting, percent calculations, upper/lowercase translations, etc.

like image 29
willoller Avatar answered Oct 20 '22 20:10

willoller