Why does Java use modified UTF-8 instead of UTF-8? [closed]

2 Answers

It is faster and simpler for handling supplementary characters (by not handling them).

Java represent characters as 16 bit chars, but unicode has evolved to contain more than 64K characters. So some characters, the supplementary characters, has to be encoded in 2 chars (surrogate pair) in Java.

Strict UTF-8 requires that the encoder converts surrogate pairs into characters then encode characters into bytes. The decoder needs to split supplementary characters back to surrogate pairs.

chars -> character -> bytes -> character -> chars

Since both ends are Java, we can take some shortcut and encode directly on the char level

char -> bytes -> char

neither encoder nor decoder need to worry about surrogate pairs.

answered Oct 11 '22 14:10

ZhongYu

I suspect that's the main reason. In C land, having to deal with strings can contain embedded NULs would complicate things.

answered Oct 11 '22 13:10

NPE

Related questions
                            
                                JAXB: when using XML schema (.xsd) to validate an XML file, if validation fail, can I know which XML tag causing it?
                            
                                How do you disable lazy class loading/initialization in Sun's JVM?
                            
                                Annotation attribute must be a class literal? Why? Constants should be fine too
                            
                                java.lang.LinkageError: ClassCastException
                            
                                Intellij IDEA: False positives on build
                            
                                FAIL - Failed to deploy application at context path /ROOT. Deploy plugin of jenkins failling to deploy
                            
                                Main method in a static inner class.?
                            
                                Google Guava Service tutorial or examples? [closed]
                            
                                Java opc client application
                            
                                How to play radio live stream .asx video/x-ms-asf? [closed]
                            
                                Can you exclude a source file for a specific PMD rule?
                            
                                Circular LinkedList implementation in Java
                            
                                Setting the certificate used by a Java SSL ServerSocket
                            
                                Fatal Spin-On-Suspend/Stuck on ThreadID
                            
                                Submitting a background task from spring mvc app
                            
                                Inserting data in one table using HQL in Hibernate
                            
                                NIO Selector: How to properly register new channel while selecting
                            
                                number of different groups of N integers that sum up to A
                            
                                How does the EJB client locate the EJB server without url?
                            
                                IntelliJ/Android -> "java: constant expression required" on case R.id.viewId

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does Java use modified UTF-8 instead of UTF-8? [closed]

Tags:

java

unicode

utf-8

java-native-interface

vitaut

People also ask

2 Answers

ZhongYu

NPE

Recent Activity

Donate For Us