Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting jbyteArray to a character array, and then printing to console

I am writing a JNI program where my .cpp file gets a jbyteArray and I want to be able to print the jbyteArray with printf. For that to happen, I believe I have to convert the jbyteArray to a character array.

For background knowledge, the java side of my JNI converts a String to a byteArray, and then that byteArray is passed in as an argument to my JNI function.

What I've done so far prints out the String correctly, but it is followed by junk characters, and I do not know how to get rid of these/if I am doing something wrong.

Here is what the String is:

dsa

and what prints to console:

dsa,�

The junk characters change depending on what the String is. Here is the part of the code that is relevant:

.java file:

public class tcr extends javax.swing.JFrame{

static{
    System.loadLibrary("tcr");
}

public native int print(byte file1[]);

    .....

    String filex1 = data1TextField.getText();//gets a filepath in the form of a String from a GUI jtextfield.
    byte file1[]= filex1.getBytes();//convert file path from string to byte array

        tcr t = new tcr();
        t.print(file1);
}

.cpp code:

JNIEXPORT jint JNICALL Java_tcr_print(JNIIEnv *env, jobject thisobj, jbyteArray file1){

    jboolean isCopy;
    jbyte* a = env->GetByteArrayElements(file1,&isCopy);
    char* b;
    b = (char*)a;
    printf("%s\n",b);
}

Any help would be appreciated.

like image 266
Sean Sen Wang Avatar asked Jul 05 '13 14:07

Sean Sen Wang


2 Answers

Look what you are doing:

jbyte* a = env->GetByteArrayElements(file1,&isCopy);

a now points to a memory address where the byte contents of the string are stored. Let's assume that the file contains the string "Hello world". In UTF-8 encoding, that would be:

48 65 6c 6c 6f 20 77 6f 72 6c 64

char* b = (char*)a;

b now points to that memory region. It's a char pointer, so you probably want to use it as a C string. However, that won't work. C strings are defined as some bytes, ending with a zero byte. Now look up there and you'll see that there is no zero byte at the end of this string.

printf("%s\n",b);

Here it is. You are passing the char pointer to printf as %s which tells printf that it's a C string. However, it isn't a C string but printf still tries to print all characters until it reaches a zero byte. So what you see after dsa are actually bytes from your memory after the end of the byte array until there is (by coincidence) a zero byte. You can fix this by copying the bytes to a buffer that is one byte longer than the byte array and then setting the last element to zero.

UPDATE:

You can create the bigger buffer and append the null byte like this:

int textLength = strlen((const char*)a);
char* b = malloc(textLength + 1);
memcpy(b, a, textLength);
b[textLength] = '\0';

Now b is a valid null-terminated C string. Also, don't forget the call to ReleaseByteArrayElements. You can do that right after the memcpy call.

like image 185
main-- Avatar answered Nov 05 '22 14:11

main--


A jbyteArray is actually a very good way to pass a Java String through JNI. It allows you to easily convert the string into the character set and encoding needed by the libraries and files/devices you are using on the C++ side.

Be sure you understand "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"

Java String uses the Unicode character set and UTF-16 encoding (with a platform-dependent byte order).

String.getBytes() converts to the "platform's default charset". So, it is making an assumption about the character set and encoding you need, and what to do about characters that are not in the target character set. You can use other Java String.getBytes overloads or the Charset methods if you want to control these things explicitly.

In deciding which character set and encoding to use, consider that Unicode has been used for a couple decades as the primary string type in Java, .NET, VB, ...; in compiler source files for Java, ...; generally in the WWW. Of course, you might be limited by the things you want to interoperate with.

Now, it seems the problem you are facing is either that the target character set is missing characters that your Java String has and a substitute is being used, or the console you are using isn't displaying them properly.

The console (or any app with a UI), obviously, has to pick a typeface with which to render the characters. Typefaces generally don't support the million codepoints available in Unicode. You may be able to change the configuration of your console (or use another). For example, in Windows, you can use cmd.exe or ps (Windows PowerShell). You can change the font in Cmd.exe windows and use chcp to change the character set.

UPDATE:

As @main-- points out, if you use a function that expects a terminator appended to the string then you have to provide it, usually by copying the array since the JVM retains ownership of the array. This the actual cause of the behavior in this case. But, all of the above is relevant, too.

like image 27
Tom Blodget Avatar answered Nov 05 '22 16:11

Tom Blodget