suppose i have this example
public static void main(String[] args) {
System.out.println("This".codePointCount(0, 4));
}
output is 4 and if instead of 4 i put 3 output will be 3 basically output is
0-3 or generally |firstIndex - secondIndex|
Don't know how it works can you please give an example where output is different then
|firstIndex - secondIndex|
Thanks
From the javadoc:
Returns the number of Unicode code points in the specified text range of this String. The text range begins at the specified beginIndex and extends to the char at index endIndex - 1. Thus the length (in chars) of the text range is endIndex-beginIndex. Unpaired surrogates within the text range count as one code point each.
Java uses Unicode to represent text (characters). Unicode gives every character a number called a "Code point". There are different ways to write these numbers in bytes, java use "UTF-16" (2 bytes per character). Unfortunately there are too many characters for 2 bytes. IE more (a lot more) than 65,535.
To get round this UTF-16 uses 4 bytes (2 pairs of 2) for code points with very large numbers. These are known as surrogate pairs.
Annoyingly java can make this confusing because it treats a 4 byte character as if it's 2 characters.
Example (credits @Pshemo): "🍓🍑" This string has 2 characters (a strawberry and a peach). Technically it has 2 code-points, one for the strawberry, one for the peach. But if you try this out you will see java says the length is 4. Because each one is a is a "surrogate pair".
For further reading look at: https://en.wikipedia.org/wiki/UTF-16 This discusses surrogate pairs as mentioned in the Javadoc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With