How does String.codePointCount() function work

Question

suppose i have this example

    public static void main(String[] args) {
    System.out.println("This".codePointCount(0, 4));

}

output is 4 and if instead of 4 i put 3 output will be 3 basically output is

0-3 or generally |firstIndex - secondIndex|

Don't know how it works can you please give an example where output is different then

|firstIndex - secondIndex|

Thanks

Philip Couling · Accepted Answer

From the javadoc:

Returns the number of Unicode code points in the specified text range of this String. The text range begins at the specified beginIndex and extends to the char at index endIndex - 1. Thus the length (in chars) of the text range is endIndex-beginIndex. Unpaired surrogates within the text range count as one code point each.

Java uses Unicode to represent text (characters). Unicode gives every character a number called a "Code point". There are different ways to write these numbers in bytes, java use "UTF-16" (2 bytes per character). Unfortunately there are too many characters for 2 bytes. IE more (a lot more) than 65,535.

To get round this UTF-16 uses 4 bytes (2 pairs of 2) for code points with very large numbers. These are known as surrogate pairs.

Annoyingly java can make this confusing because it treats a 4 byte character as if it's 2 characters.

Example (credits @Pshemo): "🍓🍑" This string has 2 characters (a strawberry and a peach). Technically it has 2 code-points, one for the strawberry, one for the peach. But if you try this out you will see java says the length is 4. Because each one is a is a "surrogate pair".

For further reading look at: https://en.wikipedia.org/wiki/UTF-16 This discusses surrogate pairs as mentioned in the Javadoc.

How does String.codePointCount() function work

Tags:

java

string

John Dadi Leop

1 Answers

Philip Couling

Recent Activity

Donate For Us

How does String.codePointCount() function work

Tags:

java

string

John Dadi Leop

1 Answers

Philip Couling

Related questions

Recent Activity

Donate For Us