What test text do you try and type into your web forms to check that they handle all the edge cases properly (especially Unicode and XSS style problems).
I am particularly interested in good Unicode strings that may do something odd if they are mis-encoded when they are displayed again.
Text that contains potentially problematic characters, like quotes, <
, >
etc would also be interesting.
Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.
To test if a program is fully Unicode compliant, write text mixing different languages in different directions and characters with diacritics, especially in Persian characters. Try also decomposed characters, for example: {e, U+0301} (decomposed form of é, U+00E9).
If you want to check whether the string contains UniCode character or not, for that you have to check Character value and compare with 127, if it is greater than 127 it is not ASCII.
Your idea of HTML-sensitive characters is a good start. I also like using characters that are kind of readable, but are still Unicode. When I was doing this kind of testing for tabblo.com, I used this string:
Testing «ταБЬℓσ»: 1<2 & 4+1>3, now 20% off!
This has HTML-sensitive characters, ASCII, upper-half ISO characters, and multi-byte Unicode characters.
Turkey testing!
http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html
This is actually pretty advanced internationalization testing, not for the faint of heart, including date formatting, percent calculations, upper/lowercase translations, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With