Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Character size in Java vs. C

Why does a character in Java take twice as much space to store as a character in C?

like image 470
ion3023 Avatar asked Feb 19 '12 22:02

ion3023


People also ask

What is the char size in Java?

char: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).

What is the size of char data type in Java Why does it differ from C language?

char in C is one byte but not guaranteed to be signed or unsigned. In Java it is 2 bytes and guaranteed to be signed to support characters from 0 to 65535. Actually, in C there is a standard byte type, char .

What is the size of 1 character in C?

Char Size. The size of both unsigned and signed char is 1 byte always, irrespective of what compiler we use.

Why is a Java character 2 bytes?

Java support more than 18 international languages so java take 2 byte for characters, because for 18 international language 1 byte of memory is not sufficient for storing all characters and symbols present in 18 languages. Java supports Unicode but c support ascii code.


1 Answers

In Java characters are 16-bit and C they are 8-bit.

A more general question is why is this so?

To find out why you need to look at history and come to conclusions/opinions on the subject.

When C was developed in the USA, ASCII was pretty standard there and you only really needed 7-bits, but with 8 you could handle some non-ASCII characters as well. It might seem more than enough. Many text based protocols like SMTP (email), XML and FIX, still only use ASCII character. Email and XML encode non ASCII characters. Binary files, sockets and stream are still only 8-bit byte native.

BTW: C can support wider characters, but that is not plain char

When Java was developed 16-bit seemed like enough to support most languages. Since then unicode has been extended to characters above 65535 and Java has had to add support for codepoints which is UTF-16 characters and can be one or two 16-bit characters.

So making a byte a byte and char an unsigned 16-bit value made sense at the time.

BTW: If your JVM supports -XX:+UseCompressedStrings it can use bytes instead of chars for Strings which only use 8-bit characters.

like image 70
Peter Lawrey Avatar answered Sep 19 '22 17:09

Peter Lawrey