Isn't the size of character in Java 2 bytes?

Tags:

I used RandomAccessFile to read a byte from a text file.

public static void readFile(RandomAccessFile fr) {     byte[] cbuff = new byte[1];     fr.read(cbuff,0,1);     System.out.println(new String(cbuff)); }

Why am I seeing one full character being read by this?

700

asked Feb 22 '11 12:02

Shrinath

2 Answers

A char represents a character in Java ^(*). It is 2 bytes large (or 16 bits).

That doesn't necessarily mean that every representation of a character is 2 bytes long. In fact many character encodings only reserve 1 byte for every character (or use 1 byte for the most common characters).

When you call the String(byte[]) constructor you ask Java to convert the byte[] to a String using the platform's default charset. Since the platform default charset is usually a 1-byte encoding such as ISO-8859-1 or a variable-length encoding such as UTF-8, it can easily convert that 1 byte to a single character.

If you run that code on a platform that uses UTF-16 (or UTF-32 or UCS-2 or UCS-4 or ...) as the platform default encoding, then you will not get a valid result (you'll get a String containing the Unicode Replacement Character instead).

That's one of the reasons why you should not depend on the platform default encoding: when converting between byte[] and char[]/String or between InputStream and Reader or between OutputStream and Writer, you should always specify which encoding you want to use. If you don't, then your code will be platform-dependent.

^{(*) that's not entirely true: a char represents a UTF-16 code unit. Either one or two UTF-16 code units represent a Unicode code point. A Unicode code point usually represents a character, but sometimes multiple Unicode code points are used to make up a single character. But the approximation above is close enough to discuss the topic at hand.}

111

answered Sep 19 '22 16:09

Joachim Sauer

Java stores all it's "chars" internally as two bytes. However, when they become strings etc, the number of bytes will depend on your encoding.

Some characters (ASCII) are single byte, but many others are multi-byte.

Java supports Unicode, thus according to:

Java Character Docs

The max value supported is "\uFFFF" (hex FFFF, dec 65535), or 11111111 11111111 binary (two bytes).

answered Sep 21 '22 16:09

Michael

Related questions
                            
                                How to represent a fix number of repeats in regular expression?
                            
                                Maven plugin not using Eclipse's proxy settings
                            
                                How to install the Sun Java JDK on Ubuntu 10.10 (Maverick Meerkat)?
                            
                                Check if an ArrayList contains every element from another ArrayList (or Collection)
                            
                                Java 8 stream map to list of keys sorted by values
                            
                                Why do we use rt.jar in a java project?
                            
                                Eclipse/Java - Values in R.string.* return int?
                            
                                java replaceLast() [duplicate]
                            
                                Gradle sourceCompatibility has no effect to subprojects
                            
                                How to Convert Firebase data to Java Object...?
                            
                                How to dynamically add elements to String array? [closed]
                            
                                IntellijIDEA not recognizing classes specified in Maven dependencies
                            
                                Error when try install plugin
                            
                                Eclipse showing "Maven Configuration Problem: Unknown"
                            
                                How to Write text file Java
                            
                                What is the regex to extract all the emojis from a string?
                            
                                Is there a good way to have a Map<String, ?> get and put ignoring case? [duplicate]
                            
                                Displaying Currency in Indian Numbering Format
                            
                                List.addAll throwing UnsupportedOperationException when trying to add another list [duplicate]
                            
                                Using maven to output the version number to a text file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Isn't the size of character in Java 2 bytes?

Tags:

java

string

char

Shrinath

People also ask

2 Answers

Joachim Sauer

Michael

Recent Activity

Donate For Us