What does "offset" mean in the context of programming?
Does it mean in the beginning or by a distance?
What does the String.offsetByCodePoints(int index, int codePointOffset)
method do? What does "unpaired surrogates" in the method documentation mean?
An example from wikipedia, let's say you have a string "abcdef" the 'd' character will have an offset of three starting from character 'a'.
Offset (computer science)
What does "offset" mean in the context of programming? Does it mean in the beginning or by a distance?
In general, "offset" means some form of distance measured from some given position. The position could be the beginning of ... something ... but it isn't necessarily.
What "offset" specifically means will depend on the context in which it is used. (Ideally, the meaning will be evident from the context.)
What does the
String.offsetByCodePoints(int index, int codePointOffset)
method do?
This method calculates the position of a specific char
within the String
. The char
will be the first char of the Unicode codepoint that is codePointOffset
codepoints after the position given by index
.
(So, in this context "offset" is referring a distance measured in Unicode code points from the position of a given code unit.)
Both index
and the result are normal string index values; i.e. they are char
positions.
The point ... is that when you are treating a String
as sequence of Unicode codepoints, your code needs to take account of the fact that a codepoint may consist of either 1 or 2 char
values.
To understand what the above all means, you may need to do some background reading on Unicode, codepoints and codeunits, and also on UTF-16 and how Java models Unicode strings.
What does "unpaired surrogates" in the method documentation mean?
Java strings represent characters that are Unicode code-points > 65535 as UTF-16 surrogate characters. In a well-formed UTF-16 string, the surrogates come in pairs, representing respectively the high and low order bits of the Unicode code-point.
The sentence is saying is that if a String
contains surrogates that are not properly paired, it will treat them as separate codepoints ... for the purpose of counting code points.
See also: What is a "surrogate pair" in Java?
According to the JavaDoc,
String.offsetByCodePoints(int index, int codePointOffset)
Returns the index within this object that is offset from {@code index} by {@code codePointOffset} code points.
Here is an example of usage...
int num = 0;
num = "Test_String".offsetByCodePoints(0, 2); //num is 2
num = "Test_String".offsetByCodePoints(3, 2); //num is 5
num = "Test_String".offsetByCodePoints(9, 5); //Throws an exception since offset goes out-of-bounds
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With