I came across "char variable is in Unicode format, but adopts / maps well to ASCII also". What is the need to mention that? Of course ASCII is 1 byte and Unicode is 2. And Unicodeitself contains ASCII code in it (by default - its the standard). So are there some languages in which a char
variable supports UNICODE but not ASCII?
Also, the character format (Unicode/ASCII) is decided by the platform we use, right? (UNIX, Linux, Windows etc). So suppose my platform used ASCII, is it not possible to switch to Unicode or vice-versa?
Save this answer. Show activity on this post. You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have.
It is obvious by now that Unicode represents far more characters than ASCII. ASCII uses a 7-bit range to encode just 128 distinct characters. Unicode on the other hand encodes 154 written scripts.
The major disadvantage of ASCII is that it can represent only 256 different characters as it can use only 8 bits. ASCII cannot be used to encode the many types of characters found around the world. Unicode was extended further to UTF-16 and UTF-32 to encode the various types of characters.
ASCII has its equivalent in Unicode. The difference between ASCII and Unicode is that ASCII represents lowercase letters (a-z), uppercase letters (A-Z), digits (0-9) and symbols such as punctuation marks while Unicode represents letters of English, Arabic, Greek etc.
Java uses Unicode internally. Always. Actually, it uses UTF-16 most of the time, but that's too much detail for now.
It can not use ASCII internally (for a String
for example). You can represent any String that can be represented in ASCII in Unicode, so that should not be a problem.
The only place where the platform comes into play is when Java has to choose an encoding when you didn't specify one. For example, when you create a FileWriter
to write String
values to a String: at that point Java needs to use an encoding to specify how the specific character should be mapped to bytes. If you don't specify one, then the default encoding of the platform is used. That default encoding is almost never ASCII. Most Linux platforms use UTF-8, Windows often uses some ISO-8859-* derivatives (or other culture-specific 8-bit encodings), but no current OS uses ASCII (simply because ASCII can't represent a lot of important characters).
In fact, pure ASCII is almost irrelevant these days: no one uses it. ASCII is only important as a common subset of the mapping of most 8-bit encodings (including UTF-8): the lower 128 Unicode codepoints map 1:1 to the numeric values 0-127 in many, many encodings. But pure ASCII (where the values 128-255 are undefined) is no longer in active use.
As a side note, Java 9 has an internal optimization called "compact strings" where Strings that contain only characters representable in Latin-1 use a single byte per character instead of 2. This optimization is very useful for all kinds of "computer speak" like XML and similar protocols where the majority of the text is in the ASCII range. But it's also fully transparent to the developer, as all that handling is done internally in the String
class and will not be visible from the outside.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With