The JavaDoc says "The null byte '\u0000' is encoded in 2-byte format rather than 1-byte, so that the encoded strings never have embedded nulls." But what does this even mean? What's an embedded null in this context? I am trying to convert from a Java saved UTF-8 string to "real" UTF-8.

In C a string is terminated by the byte value 00. The thing here is that you can have 0-chars in Java strings but to avoid confusion when passing the string over to C (which all native methods are written in) the character is encoded in another way, namely as two bytes <pre class="prettyprint"><code>11000000 10000000 </code></pre> (according to the javadoc) neither of which is actually 00. This is a hack to work around something you cannot change easily. Also note, that this is valid UTF-8 and decode correctly to 00.

Java UTF-8 differences

1 Answers

In C a string is terminated by the byte value 00.

The thing here is that you can have 0-chars in Java strings but to avoid confusion when passing the string over to C (which all native methods are written in) the character is encoded in another way, namely as two bytes

Click to copy

11000000 10000000

(according to the javadoc) neither of which is actually 00.

This is a hack to work around something you cannot change easily.

Also note, that this is valid UTF-8 and decode correctly to 00.

138

answered Oct 14 '22 16:10

Thorbjørn Ravn Andersen

Related questions
                            
                                Understanding Java bytes
                            
                                Execute my groovy script with ant or maven
                            
                                What prevents HttpSession's id from being stolen?
                            
                                Best way to share portions of a Maven pom.xml across unrelated projects?
                            
                                Java convert GIF image to PNG format
                            
                                Warning for generic varargs
                            
                                How to send multiple emails in one session?
                            
                                Getting equal symbol expected while using jstl
                            
                                setTextViewText not updating widget
                            
                                SWT: How to do High Quality Image Resize
                            
                                Illegal start of expression for Annotations
                            
                                Can a Java HashMap's size() be out of sync with its actual entries' size?
                            
                                Date convert dd-MMM-yyyy to dd-MM-yyyy in java
                            
                                Looking for a faster way to perform string searches
                            
                                Performance of BeanUtils vs. ReflectionToStringBuilder (for use in Bean classes)
                            
                                set size wont work in java
                            
                                Apache CXF - credentials not being sent from WSS4JOutInterceptor?
                            
                                using Object as a mutex in java
                            
                                Definite guide to valid Cookie values
                            
                                How to navigate to implementing class from interface in Eclipse? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Java UTF-8 differences

Tags:

java

utf-8

Prof. Falken

People also ask

1 Answers

Thorbjørn Ravn Andersen

Recent Activity

Donate For Us