Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can we switch between ASCII and Unicode

I came across "char variable is in Unicode format, but adopts / maps well to ASCII also". What is the need to mention that? Of course ASCII is 1 byte and Unicode is 2. And Unicodeitself contains ASCII code in it (by default - its the standard). So are there some languages in which a char variable supports UNICODE but not ASCII?

Also, the character format (Unicode/ASCII) is decided by the platform we use, right? (UNIX, Linux, Windows etc). So suppose my platform used ASCII, is it not possible to switch to Unicode or vice-versa?

like image 432
shar Avatar asked Mar 25 '13 08:03

shar


People also ask

Can Unicode be converted to ASCII?

Save this answer. Show activity on this post. You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have.

Why is Unicode better then ASCII?

It is obvious by now that Unicode represents far more characters than ASCII. ASCII uses a 7-bit range to encode just 128 distinct characters. Unicode on the other hand encodes 154 written scripts.

Why would we need Unicode if we already had ASCII?

The major disadvantage of ASCII is that it can represent only 256 different characters as it can use only 8 bits. ASCII cannot be used to encode the many types of characters found around the world. Unicode was extended further to UTF-16 and UTF-32 to encode the various types of characters.

Is ASCII the same as Unicode?

ASCII has its equivalent in Unicode. The difference between ASCII and Unicode is that ASCII represents lowercase letters (a-z), uppercase letters (A-Z), digits (0-9) and symbols such as punctuation marks while Unicode represents letters of English, Arabic, Greek etc.


1 Answers

Java uses Unicode internally. Always. Actually, it uses UTF-16 most of the time, but that's too much detail for now.

It can not use ASCII internally (for a String for example). You can represent any String that can be represented in ASCII in Unicode, so that should not be a problem.

The only place where the platform comes into play is when Java has to choose an encoding when you didn't specify one. For example, when you create a FileWriter to write String values to a String: at that point Java needs to use an encoding to specify how the specific character should be mapped to bytes. If you don't specify one, then the default encoding of the platform is used. That default encoding is almost never ASCII. Most Linux platforms use UTF-8, Windows often uses some ISO-8859-* derivatives (or other culture-specific 8-bit encodings), but no current OS uses ASCII (simply because ASCII can't represent a lot of important characters).

In fact, pure ASCII is almost irrelevant these days: no one uses it. ASCII is only important as a common subset of the mapping of most 8-bit encodings (including UTF-8): the lower 128 Unicode codepoints map 1:1 to the numeric values 0-127 in many, many encodings. But pure ASCII (where the values 128-255 are undefined) is no longer in active use.

As a side note, Java 9 has an internal optimization called "compact strings" where Strings that contain only characters representable in Latin-1 use a single byte per character instead of 2. This optimization is very useful for all kinds of "computer speak" like XML and similar protocols where the majority of the text is in the ASCII range. But it's also fully transparent to the developer, as all that handling is done internally in the String class and will not be visible from the outside.

like image 179
Joachim Sauer Avatar answered Sep 22 '22 18:09

Joachim Sauer