Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "offset" mean in the context of programming?

What does "offset" mean in the context of programming?

Does it mean in the beginning or by a distance?

What does the String.offsetByCodePoints(int index, int codePointOffset) method do? What does "unpaired surrogates" in the method documentation mean?

like image 874
skystar7 Avatar asked Oct 19 '10 18:10

skystar7


3 Answers

An example from wikipedia, let's say you have a string "abcdef" the 'd' character will have an offset of three starting from character 'a'.

Offset (computer science)

like image 172
Sakén Avatar answered Oct 12 '22 01:10

Sakén


What does "offset" mean in the context of programming? Does it mean in the beginning or by a distance?

In general, "offset" means some form of distance measured from some given position. The position could be the beginning of ... something ... but it isn't necessarily.

What "offset" specifically means will depend on the context in which it is used. (Ideally, the meaning will be evident from the context.)


What does the String.offsetByCodePoints(int index, int codePointOffset) method do?

This method calculates the position of a specific char within the String. The char will be the first char of the Unicode codepoint that is codePointOffset codepoints after the position given by index.

(So, in this context "offset" is referring a distance measured in Unicode code points from the position of a given code unit.)

Both index and the result are normal string index values; i.e. they are char positions.

The point ... is that when you are treating a String as sequence of Unicode codepoints, your code needs to take account of the fact that a codepoint may consist of either 1 or 2 char values.

To understand what the above all means, you may need to do some background reading on Unicode, codepoints and codeunits, and also on UTF-16 and how Java models Unicode strings.


What does "unpaired surrogates" in the method documentation mean?

Java strings represent characters that are Unicode code-points > 65535 as UTF-16 surrogate characters. In a well-formed UTF-16 string, the surrogates come in pairs, representing respectively the high and low order bits of the Unicode code-point.

The sentence is saying is that if a String contains surrogates that are not properly paired, it will treat them as separate codepoints ... for the purpose of counting code points.

See also: What is a "surrogate pair" in Java?

like image 34
Stephen C Avatar answered Oct 11 '22 23:10

Stephen C


According to the JavaDoc,

String.offsetByCodePoints(int index, int codePointOffset)

Returns the index within this object that is offset from {@code index} by {@code codePointOffset} code points.

Here is an example of usage...

int num = 0;
num = "Test_String".offsetByCodePoints(0, 2); //num is 2
num = "Test_String".offsetByCodePoints(3, 2); //num is 5
num = "Test_String".offsetByCodePoints(9, 5); //Throws an exception since offset goes out-of-bounds
like image 43
Ryan Berger Avatar answered Oct 11 '22 23:10

Ryan Berger