Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode Support in Various Programming Languages

People also ask

What programming languages use Unicode?

C#, Java, Python3, as far as I know, are all Unicode based programming.

Does Unicode support all languages?

The simplest answer is that Unicode covers all of the languages that can be written in the following widely-used scripts: Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac, Thaana, Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar, Georgian, Hangul, ...

How many languages can Unicode handle?

The representation of all the 22 constitutionally recognized languages in Unicode Standard is completed. Ministry of Electronics and Information Technology identified the required changes in the then Unicode Standard 3.0 for representation of Indian languages in the Unicode Standard.

Is Unicode supported in Python?

Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters. Unicode (https://www.unicode.org/) is a specification that aims to list every character used by human languages and give each character its own unique code.


Perl

Perl has built-in Unicode support, mostly. Sort of. From perldoc:

  • perlunitut - Tutorial on using Unicode in Perl. Largely teaches in absolute terms about what you should and should not do as far as Unicode. Covers basics.
  • perlunifaq - Frequently asked questions about Unicode in Perl.
  • perluniintro - Introduction to Unicode in Perl. Less "preachy" than perlunitut.
  • perlunicode - For when you absolutely have to know everything there is to know about Unicode and Perl.

Python 3k

Python 3k (or 3.0 or 3000) has new approach for handling text (unicode) and data:
Text Vs. Data Instead Of Unicode Vs. 8-bit. See also Unicode HOWTO.


Java

Same as with .NET, Java uses UTF-16 internally: java.lang.String

A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.


HQ9+

The Q command has complete Unicode support in most implementations.


Go

Google's Go programming language supports Unicode and works with UTF-8.


Delphi

Delphi 2009 fully supports Unicode. They've changed the implementation of string to default to 16-bit Unicode encoding, and most libraries including the third party ones support Unicode. See Marco Cantù's Delphi and Unicode.

Prior to Delphi 2009, the support for Unicode was limited, but there was WideChar and WideString to store the 16-bit encoded string. See Unicode in Delphi for more info.

Note, you can still develop bilingual CJKV application without using Unicode. For example, Shift JIS encoded string for Japanese can be stored using plain AnsiString.