Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between String.length() and String.getBytes().length

Tags:

java

string

I am beginner and self-learning in Java programming. So, I want to know about difference between String.length() and String.getBytes().length in Java.

What is more suitable to check the length of the string?

like image 347
Key Avatar asked Apr 29 '13 04:04

Key


People also ask

What is getBytes ()?

The method getBytes() encodes a String into a byte array using the platform's default charset if no argument is passed. We can pass a specific Charset to be used in the encoding process, either as a String object or a String object.

Is length () a string in Java?

The Java String length() method is a method that is applicable for string objects. length() method returns the number of characters present in the string. The length() method is suitable for string objects but not for arrays. The length() method can also be used for StringBuilder and StringBuffer classes.

What does method length () do in string class?

The length() method returns the length of a specified string.

How do you determine string byte size?

Java strings are physically stored in UTF-16BE encoding, which uses 2 bytes per code unit, and String. length() measures the length in UTF-16 code units, so this is equivalent to: And this will tell you the size of the internal char array, in bytes.


2 Answers

String.length()

String.length() is the number of 16-bit UTF-16 code units needed to represent the string. That is, it is the number of char values that are used to represent the string and thus also equal to toCharArray().length. For most characters used in western languages this is typically the same as the number of unicode characters (code points) in the string, but the number of code point will be less than the number of code units if any UTF-16 surrogate pairs are used. Such pairs are needed only to encode characters outside the BMP and are rarely used in most writing (emoji are a common exception).

String.getBytes().length

String.getBytes().length on the other hand is the number of bytes needed to represent your string in the platform's default encoding. For example, if the default encoding was UTF-16 (rare), it would be exactly 2x the value returned by String.length() (since each 16-bit code unit takes 2 bytes to represent). More commonly, your platform encoding will be a multi-byte encoding like UTF-8.

This means the relationship between those two lengths are more complex. For ASCII strings, the two calls will almost always produce the same result (outside of unusual default encodings that don't encode the ASCII subset in 1 byte). Outside of ASCII strings, String.getBytes().length is likely to be longer, as it counts bytes needed to represent the string, while length() counts 2-byte code units.

Which is more suitable?

Usually you'll use String.length() in concert with other string methods that take offsets into the string. E.g., to get the last character, you'd use str.charAt(str.length()-1). You'd only use the getBytes().length if for some reason you were dealing with the array-of-bytes encoding returned by getBytes.

like image 128
BeeOnRope Avatar answered Sep 20 '22 20:09

BeeOnRope


The length() method returns the length of the string in characters.

Characters may take more than a single byte. The expression String.getBytes().length returns the length of the string in bytes, using the platform's default character set.

like image 26
Andy Thomas Avatar answered Sep 24 '22 20:09

Andy Thomas