Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String size relation between java and C++

Tags:

java

c++

string

I am working on a certain application built on Java. The java layer talks to C++ layer which does the logic of forming sql queries from database and returns the result back to the Java layer.

With a simpler example :

On the java side

nameField = new JTextField(20) //20 chars max length
name = t.getText() // name is sent to CPP layer

On the CPP layer, name from java layer is received and stored in a local variable say cppName. I am confused about the declaration of variables used in CPP layer. Most of them are declared like this :

char cppName[20*4+1]

I want to know the significance of 20*4+1 here. The reason for declaring all variables on cpp side with size as javaSize*4+1.

like image 285
Vamsi Emani Avatar asked Nov 30 '25 22:11

Vamsi Emani


2 Answers

Are the characters in the java code UNICODE? If so, a single char isn't enough to store a UNICODE character, the ratio is 4:1. The final character (+1) is the null terminator.

So you need 4 bytes, which is 4 chars, in the C++ side to store a single Java character, and char-represented strings in C++ are null-terminated (last character has to be '\0'), so 20*4+1.

like image 85
Luchian Grigore Avatar answered Dec 03 '25 14:12

Luchian Grigore


If the String is translated via UTF-8 each character can turn into 4-bytes. As CPP provides no protection if you overrun the memory reserved, you have to take the worst case size, even if you don't believe you will every use these characters.

BTW In Java String are stored as UTF-16 which means it supports characters above 65535 as code points using two chars.

http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

The longest character for supported code points in Java turns into 4 bytes when UTF-8 encoded.

StringBuilder sb = new StringBuilder();
sb.appendCodePoint(Character.MAX_CODE_POINT);
System.out.println(sb.toString().getBytes("UTF-8").length); // prints 4

though this technically takes up two char in the String. If you take the largest character you get 3. So really 4 is overly conservative (as it takes two chars to make 4 bytes)

StringBuilder sb = new StringBuilder();
sb.appendCodePoint(Character.MAX_VALUE);
System.out.println(sb.toString().getBytes("UTF-8").length); // prints 3
like image 29
Peter Lawrey Avatar answered Dec 03 '25 14:12

Peter Lawrey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!