Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java native code string ending

Does the string returned from the GetStringUTFChars() end with a null terminated character? Or do I need to determine the length using GetStringUTFLength and null terminate it myself?

like image 790
Goozo Avatar asked May 22 '13 14:05

Goozo


People also ask

How do you end a string in Java?

Be warned that '\n' characters are not required to be able to end a string, so you can't rely on that alone. Strings in Java do NOT have a NULL terminator as in C, so you need to use the length() method to find out how long a string is. you need to use the length() method to find out how long a string is.

Is GetStringUTFChars null terminated?

Yes, GetStringUTFChars returns a null-terminated string.

Do Java strings end with null?

Java strings are not terminated with a null characters as in C or C++. Although java strings uses internally the char array but there is no terminating null in that. String class provides a method called length to know the number of characters in the string.

What is native Java code?

In software design, the Java Native Interface (JNI) is a foreign function interface programming framework that enables Java code running in a Java virtual machine (JVM) to call and be called by native applications (programs specific to a hardware and operating system platform) and libraries written in other languages ...


2 Answers

Yes, GetStringUTFChars returns a null-terminated string. However, I don't think you should take my word for it, instead you should find an authoritative online source that answers this question.

Let's start with the actual Java Native Interface Specification itself, where it says:

Returns a pointer to an array of bytes representing the string in modified UTF-8 encoding. This array is valid until it is released by ReleaseStringUTFChars().

Oh, surprisingly it doesn't say whether it's null-terminated or not. Boy, that seems like a huge oversight, and fortunately somebody was kind enough to log this bug on Sun's Java bug database back in 2008. The notes on the bug point you to a similar but different documentation bug (which was closed without action), which suggests that the readers buy a book, "The Java Native Interface: Programmer's Guide and Specification" as there's a suggestion that this become the new specification for JNI.

But we're looking for an authoritative online source, and this is neither authoritative (it's not yet the specification) nor online.

Fortunately, the reviews for said book on a certain popular online book retailer suggest that the book is freely available online from Sun, and that would at least satisfy the online portion. Sun's JNI web page has a link that looks tantalizingly close, but that link sadly doesn't go where it says it goes.

So I'm afraid I cannot point you to an authoritative online source for this, and you'll have to buy the book (it's actually a good book), where it will explain to you that:

UTF-8 strings are always terminated with the '\0' character, whereas Unicode strings are not. To find out how many bytes are needed to represent a jstring in the UTF-8 format, JNI programmers can either call the ANSI C function strlen on the result of GetStringUTFChars, or call the JNI function GetStringUTFLength on the jstring reference directly.

(Note that in the above sentence, "Unicode" means "UTF-16", or more accurately "the internal two-byte string representation used by Java, though finding proof of that is left as an exercise for the reader.)

like image 195
Edward Thomson Avatar answered Oct 16 '22 21:10

Edward Thomson


All current answers to the question seem to be outdated (Edward Thomson's answer last update dates back to 2015), or referring to Android JNI documentation which can be authoritative only in the Android world. The matter has been clarified in recent (2017) official Oracle JNI documentation clean-up and updates, more specifically in this issue.

Now the JNI specification clearly states:

String Operations

This specification makes no assumptions on how a JVM represent Java strings internally. Strings returned from these operations:

  • GetStringChars()
  • GetStringUTFChars()
  • GetStringRegion()
  • GetStringUTFRegion()
  • GetStringCritical()

are therefore not required to be NULL terminated. Programmers are expected to determine buffer capacity requirements via GetStringLength() or GetStringUTFLength().

In the general case this means one should never assume JNI returned strings are null terminated, not even UTF-8 strings. In a pragmatic world one can test a specific behavior in a list of supported JVM(s). In my experience, rereferring to JVMs I actually tested:

  • Oracle JVMs do null terminate both UTF-16 (with \u0000) and UTF-8 strings (with '\0');
  • Android JVMs do terminate UTF-8 strings but not UTF-16 ones.
like image 31
ceztko Avatar answered Oct 16 '22 21:10

ceztko