Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the most difficult-to-render Unicode samples?

I'm trying to implement a cross-platform (desktop browsers, iOS, & Android) typography system that allows users to input any Unicode string.

What are some strings I should use to stress-test my system and ensure the most nines of users will have a good experience? Is there a standard or de-facto standard list that I can also use?

like image 903
Ky. Avatar asked Dec 30 '15 22:12

Ky.


People also ask

What is the main difference between ASCII and Unicode?

ASCII and Unicode are two popular encoding schemes. ASCII encodes symbols, digits, letters, etc., whereas Unicode encodes special texts from different languages, letters, symbols, etc.

Why is Unicode not working?

If you are unable to read some Unicode characters in your browser, it may be because your system is not properly configured. Here are some basic instructions for doing that. There are two basic steps: Install fonts that cover the characters you need.

How many bits does a Unicode character require?

Unicode uses two encoding forms: 8-bit and 16-bit, based on the data type of the data that is being that is being encoded. The default encoding form is 16-bit, where each character is 16 bits (2 bytes) wide. Sixteen-bit encoding form is usually shown as U+hhhh, where hhhh is the hexadecimal code point of the character.

How many characters can Unicode represent?

Unicode is a universal character set. It is aimed to include all the characters needed for any writing system or language. The first code point positions in Unicode use 16 bits to represent the most commonly used characters in a number of languages. This Basic Multilingual Plane allows for 65,536 characters.


2 Answers

Here are some strings that I use in tests like that:

  • Vertically-stacked characters: Z̤͔ͧ̑̓ä͖̭̈̇lͮ̒ͫǧ̗͚̚o̙̔ͮ̇͐̇
  • Right-to-left words: اختبار النص
  • Mixed-direction words: من left اليمين to الى right اليسار
  • Mixed-direction characters: a‭b‮c‭d‮e‭f‮g
  • Very long characters: ﷽﷽﷽﷽﷽﷽﷽﷽﷽﷽﷽﷽﷽﷽﷽﷽
  • Emoji with skintone variations: 👱👱🏻👱🏼👱🏽👱🏾👱🏿
  • Emoji with gender variations: 🧟‍♀️🧟‍♂️
  • Emoji created by combining codepoints: 👨‍❤️‍💋‍👨👩‍👩‍👧‍👦🏳️‍⚧️🇵🇷
like image 169
Ky. Avatar answered Oct 14 '22 06:10

Ky.


Some others:

  • Reversible characters in Right-to-Left scripts. Ex. Parentheses get reversed for display in Hebrew. Unicode spec has a whole list of these reversible characters.
  • Scripts with letter shaping: Arabic, Hindi, etc.
like image 1
Rich Taylor Avatar answered Oct 14 '22 06:10

Rich Taylor