In documentation of JNI function FindClass I can read about argument name: <blockquote> name: a fully-qualified class name (...) The string is encoded in modified UTF-8. </blockquote> According to documentation modified UTF-8 has to end with double '\0' chars: <blockquote> the null character (char)0 is encoded using the two-byte format rather than the one-byte format </blockquote> Does it mean that I should invoke FindClass from C in this way: <code>FindClass("java/lang/String\0")</code> i.e. with double '\0' at the end?

No, you don't encode the terminating zero, it is not part of the class name.

JNI strings and C strings

3 Answers

Character set, encoding and termination are three different things. Obviously, an encoding is designed for a specific character set but a character set can be encoded in multiple ways. And, often, a terminator (if used) is an encoded character, but with modified UTF-8, this is not the case.

Java uses the Unicode character set. For string and char types, it uses the UTF-16 encoding. The string type is counted; It doesn't use a terminator.

In C, terminated strings are common, as well as single-byte encodings of various character sets. C and C++ compilers terminate literal strings with the NUL character. In the destination character set encoding of the compiler, this is either one or two 0x00 bytes. Almost all common character sets and their encodings have the same byte representation for the non-control ASCII characters. This is true of the UTF-8 encoding of the Unicode character set. (But, note that is not true for characters outside of the limited subset.)

The JNI designers opted to use this limited "interoperability" between C strings. Many JNI functions accept 0x00-terminated modified UTF-8 strings. These are compatible what a C compiler would produce from a literal string in the source code, again provided that the characters are limited to non-control ASCII characters. This covers the use case of writing Java package & class, method and field strings in JNI. (Well, almost: Java allows any Unicode currency symbol in an identifier.)

So, you can pass C string literals to JNI functions in a WYSIWYG style. No need to add a terminator—the compiler does that. The C compiler would encode extra '\0' characters as 0x00 so it wouldn't do any harm but isn't necessary.

There are a couple modifications from the standard UTF-8 encoding. One is to allow C functions that expect a 0x00 terminator to "handle" modified UTF-8 strings, the NUL character (U+00000) is encoded to avoid 0x00, which would be the standard. That allows modified UTF-8 strings to be laid into a buffer with a 0x00 terminator beyond the bytes of the original encoded string. The other modification is a bit esoteric but both modifications make a modified UTF-8 string incompatible with a strictly compliant UTF-8 function.

You didn't ask, but there is another use of 0x00 terminated, modified UTF-8 strings in JNI. It is with the GetStringUTFChars and NewStringUTF functions. (The JNI documentation doesn't actually say that GetStringUTFChars returns a 0x00 terminated string but there are no known JVM implementations that don't. Check your JVM implementor's documentation or source code.) These functions are designed on the same "interoperability" basis. However, the use cases are different, making them dangerous. They are generally used to pass Java strings between C functions. The C functions, generally, would have no idea what modified UTF-8 is, or possibly not even what UTF-8 or Unicode are. It is much more direct to use the Java String and Charset classes to convert to and from character sets and encodings that the C functions are designed for. Often, it is a system setting, user setting, application setting or thread setting that determines which a C function is using. The Java String class attempts to conform to such settings when not given a specific encoding for a conversion. But, it many cases, the desired encoding is fixed and can be specified with clear intent.

147

answered Oct 11 '22 12:10

Tom Blodget

No, you don't encode the terminating zero, it is not part of the class name.

answered Oct 11 '22 14:10

Alex Cohn

No, according to the first reference I found, it means it should be encoded like this:

Click to copy

FindChar("java/lang/String\xc0\x80");
                              ^
                              |
                              |
                     This is not the shortest
                     way to encode the codepoint
                     U+0000, which is why it's
                     "modified" UTF-8.

Note that this assumes that you're really looking for class names whose names end in U+0000, which is rather unlikely. The C string should be terminated just like normal, with a single 0-byte as you get from just:

Click to copy

FindChar("java/lang/String");

The special 2-byte encoding of U+0000 provided by Modified UTF-8 only matters if you want to put U+0000 in a string, and still be able to differentiate it from the C terminator.

answered Oct 11 '22 14:10

unwind

Related questions
                            
                                Netty - UDP server
                            
                                jdbc prepared statement with oracle NUMBER type
                            
                                Idiom for pairwise iteration through a sorted collection
                            
                                Java, Compare 3 integers, arrange largest, median and smallest
                            
                                How merge list when combine two hashMap objects in Java [duplicate]
                            
                                Java Library to query Collections/Objects
                            
                                which one to choose between calling a function twice and storing the return value in a variable?
                            
                                Java the difference of Socket and ServerSocket in using port
                            
                                JFrame transition effect - when called setState(Frame.ICONIFIED) it just goes to Taskbar without animation
                            
                                How to switch from a hardcoded static config file to a .properties file?
                            
                                Java code to run .exe shortcuts
                            
                                EJB injections vs only JSF managed beans
                            
                                ArrayList<HashMap<String,String>> to String[]
                            
                                Static mocking with PowerMock and Mockito not working
                            
                                mongodb java $in query with $regex
                            
                                Programmatically configure Hibernate with dynamic username and password
                            
                                Convert a Website with a responsive design to the Android app [closed]
                            
                                Create new GSCService instance Run time error
                            
                                log4j. Rollover each hour, zip daily
                            
                                Android - Crossfade multiple images in an ImageView

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

JNI strings and C strings

Tags:

java

c++

c

java-native-interface

rnd

People also ask

3 Answers

Tom Blodget

Alex Cohn

unwind

Recent Activity

Donate For Us