Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the big deal with unicode?

Tags:

unicode

I've heard a lot of people talk about how some new version of a language now supports unicode, and how much of an achievement unicode is. What's the big deal about being able to support a new characterset. It seems like something which would rarely if ever be used but people mention it quite often. What's the benefit or reason people use or even care about unicode?

like image 735
Daisetsu Avatar asked Feb 02 '11 22:02

Daisetsu


2 Answers

Programming languages are used to produce software.

Software is used to solve problems faced by humans.

Producing software has a cost.

Software that solves problems for humans produces value. This value can be expressed in the form of profit, or the reduction of costs, depending on the business model of the software developer. How the value is expressed is irrelevant for the purposes of this discussion; what is relevant is that net value is produced.

There are seven billion humans in the world. A significant fraction of them are most comfortable reading text that is not written in the Latin alphabet.

Software which purports to solve a problem for some fraction of those seven billion humans who do not use the Latin alphabet does so more effectively if developers can easily manipulate text written in non-Latin alphabets.

Therefore, a programming language which supports non-Latin character sets lowers the costs of software developers, thereby enabling them to solve more problems for more people at lower costs, and thereby produce more value.

Unicode is the de facto standard for manipulation of non-Latin text.

Therefore, Unicode is important to the design and implementation of programming languages.

Our goal as programming language designers is the creation of tools which produce maximum value. Supporting Unicode is an easy way to massively increase the scope and range of real human problems that can be solved in software.

like image 135
Eric Lippert Avatar answered Sep 29 '22 21:09

Eric Lippert


In the beginning, there were 256 possible characters and many different Code pages to represent them. It became a tangled mess. Supporting multiple languages and multiple characters sets became a programmer's nightmare.

Then the Unicode Consortium was formed. It created a standard that would allow a single character set with 256 x 256 = 65536 characters (plus combinations thereof) to include almost all languages of the world.

The biggest advantage is that a single character string may contain multiple languages. That is no small thing.

Unicode is now the native character specification used in Windows ever since Windows 2000. it is also allowed as a character set in HTML and on websites.

If your application does not support Unicode, or is not planning to support it, then it is only a matter of time until your application will be left behind.

like image 40
lkessler Avatar answered Sep 29 '22 22:09

lkessler