Is a Java char array always a valid UTF-16 (Big Endian) encoding?

1 Answers

No. You can create char instances that contain any 16-bit value you desire---there is nothing that constrains them to be valid UTF-16 code units, nor constrains an array of them to be a valid UTF-16 sequence. Even String does not require that its data be valid UTF-16:

char data[] = {'\uD800', 'b', 'c'};  // Unpaired lead surrogate
String str = new String(data);

The requirements for valid UTF-16 data are set out in Chapter 3 of the Unicode Standard (basically, everything must be a Unicode scalar value, and all surrogates must be correctly paired). You can test if a char array is a valid UTF-16 sequence, and turn it into a sequence of UTF-16BE (or LE) bytes, by using a CharsetEncoder:

CharsetEncoder encoder = Charset.forName("UTF-16BE").newEncoder();
ByteBuffer bytes = encoder.encode(CharBuffer.wrap(data)); // throws MalformedInputException

(And similarly using a CharsetDecoder if you have bytes.)

answered Sep 17 '22 10:09

一二三

Related questions
                            
                                How to move file from directory A to directory B in remote server?
                            
                                Is it possible to put ImageView on Canvas in JavaFX?
                            
                                Generate pdf file dynamically from html template and produce table of contents in java
                            
                                Three questions about doing lots of calculations
                            
                                How to get org.mangosdk.spi.ProviderFor dependency for writing a custom Lombok transformation?
                            
                                Java, compilation error, Constructors
                            
                                Where is maven-rpm-plugin documentation after codehaus gone
                            
                                Get an already existing object from another class
                            
                                Why does this method call fail? (Generics & wildcards)
                            
                                Inferring a generic type from a generic type in Java (compile time error)
                            
                                How to return http status code for exceptions in rest services
                            
                                Java compiler reordering
                            
                                Android how to get response string from Callback using OkHttp?
                            
                                Word count with java 8
                            
                                Servlet - java.lang.IllegalStateException: getWriter() has already been called for this response
                            
                                How do I lazily concatenate streams?
                            
                                How does the java compiler know of inherited methods?
                            
                                Understanding lock scope
                            
                                Disable findbugs checked bug categories in Gradle build
                            
                                When should constants be defined in their own files?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is a Java char array always a valid UTF-16 (Big Endian) encoding?

Tags:

java

arrays

char

character-encoding

unicode

Maarten Bodewes

People also ask

1 Answers

一二三

Recent Activity

Donate For Us