Can we switch between ASCII and Unicode

Tags:

I came across "char variable is in Unicode format, but adopts / maps well to ASCII also". What is the need to mention that? Of course ASCII is 1 byte and Unicode is 2. And Unicodeitself contains ASCII code in it (by default - its the standard). So are there some languages in which a char variable supports UNICODE but not ASCII?

Also, the character format (Unicode/ASCII) is decided by the platform we use, right? (UNIX, Linux, Windows etc). So suppose my platform used ASCII, is it not possible to switch to Unicode or vice-versa?

432

asked Mar 25 '13 08:03

shar

1 Answers

Java uses Unicode internally. Always. _{Actually, it uses UTF-16 most of the time, but that's too much detail for now.}

It can not use ASCII internally (for a String for example). You can represent any String that can be represented in ASCII in Unicode, so that should not be a problem.

The only place where the platform comes into play is when Java has to choose an encoding when you didn't specify one. For example, when you create a FileWriter to write String values to a String: at that point Java needs to use an encoding to specify how the specific character should be mapped to bytes. If you don't specify one, then the default encoding of the platform is used. That default encoding is almost never ASCII. Most Linux platforms use UTF-8, Windows often uses some ISO-8859-* derivatives (or other culture-specific 8-bit encodings), but no current OS uses ASCII (simply because ASCII can't represent a lot of important characters).

In fact, pure ASCII is almost irrelevant these days: no one uses it. ASCII is only important as a common subset of the mapping of most 8-bit encodings (including UTF-8): the lower 128 Unicode codepoints map 1:1 to the numeric values 0-127 in many, many encodings. But pure ASCII (where the values 128-255 are undefined) is no longer in active use.

As a side note, Java 9 has an internal optimization called "compact strings" where Strings that contain only characters representable in Latin-1 use a single byte per character instead of 2. This optimization is very useful for all kinds of "computer speak" like XML and similar protocols where the majority of the text is in the ASCII range. But it's also fully transparent to the developer, as all that handling is done internally in the String class and will not be visible from the outside.

179

answered Sep 22 '22 18:09

Joachim Sauer

Related questions
                            
                                Spring form without commandName [duplicate]
                            
                                Difference between System.out and Printstream
                            
                                Improve performance on BigDecimal to double conversion
                            
                                conditional import
                            
                                How to run Processing applications from the terminal
                            
                                Ternary operator
                            
                                Motivation for Event Bus in GWT
                            
                                OpenGL 3 (LWJGL) LookAt Matrix Confusion
                            
                                Java OO: Is this even possible?
                            
                                Method parameter must be obj of certain class that implements certain interface
                            
                                How can a Java program keep track of the maximum actual heap size used during its run?
                            
                                Struts2: Updating the values of a "List Of Objects" inside a Map
                            
                                double to long without conversion in Java
                            
                                How can I mock an auto closeable resource properly?
                            
                                How to display Monday as first day CalendarView
                            
                                Return String in DataHandler
                            
                                Router port forwarding using cling
                            
                                Connection could not be allocated because: User id length (0) is outside the range of 1 to 255
                            
                                Some Java 7 warnings - how to remove them
                            
                                NumberFormat.parse() fails for some currency strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can we switch between ASCII and Unicode

Tags:

java

unicode

ascii

shar

People also ask

1 Answers

Joachim Sauer

Recent Activity

Donate For Us