Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are upper bounds of indexed ranges always assumed to be exclusive?

So in Java, whenever an indexed range is given, the upper bound is almost always exclusive.

From java.lang.String:

substring(int beginIndex, int endIndex)

Returns a new string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1

From java.util.Arrays:

copyOfRange(T[] original, int from, int to)

from - the initial index of the range to be copied, inclusive
to - the final index of the range to be copied, exclusive.

From java.util.BitSet:

set(int fromIndex, int toIndex)

fromIndex - index of the first bit to be set.
toIndex - index after the last bit to be set.

As you can see, it does look like Java tries to make it a consistent convention that upper bounds are exclusive.

My questions are:

  • Is this the official authoritative recommendation?
  • Are there notable violations that we should be wary of?
  • Is there a name for this system? (ala "0-based" vs "1-based")

CLARIFICATION: I fully understand that a collection of N objects in a 0-based system is indexed 0..N-1. My question is that if a range (2,4) given, it can be either 3 items or 2, depending on the system. What do you call these systems?

AGAIN, the issue is not "first index 0 last index N-1" vs "first index 1 last index N" system; that's known as the 0-based vs 1-based system.

The issue is "There are 3 elements in (2,4)" vs "There are 2 elements in (2,4)" systems. What do you call these, and is one officially sanctioned over the other?

like image 969
polygenelubricants Avatar asked Mar 13 '10 22:03

polygenelubricants


2 Answers

In general, yes. If you are working in a language with C-like syntax (C, C++, Java), then arrays are zero-indexed, and most random access data structures (vectors, array-lists, etc.) are going to be zero-indexed as well.

Starting indices at zero means that the size of the data structure is always going to be one greater than last valid index in the data structure. People often want to know the size of things, of course, and so it's more convenient to talk about the size than to talk about the the last valid index. People get accustomed to talking about ending indices in an exclusive fashion, because an array a[] that is n elements long has its last valid element in a[n-1].

There is another advantage to using an exclusive index for the ending index, which is that you can compute the size of a sublist by subtracting the inclusive beginning index from the exclusive ending index. If I call myList.sublist(3, 7), then I get a sublist with 7 - 3 = 4 elements in it. If the sublist() method had used inclusive indices for both ends of the list, then I would need to add an extra 1 to compute the size of the sublist.

This is particularly handy when the starting index is a variable: Getting the sublist of myList starting at i that is 5 elements long is just myList.sublist(i, i + 5).

All of that being said, you should always read the API documentation, rather than assuming that a given beginning index or ending index will be inclusive or exclusive. Likewise, you should document your own code to indicate if any bounds are inclusive or exclusive.

like image 66
Joe Carnahan Avatar answered Sep 28 '22 02:09

Joe Carnahan


Credit goes to FredOverflow in his comment saying that this is called the "half-open range". So presumably, Java Collections can be described as "0-based with half-open ranges".

I've compiled some discussions about half-open vs closed ranges elsewhere:


siliconbrain.com - 16 good reasons to use half-open ranges (edited for conciseness):

  • The number of elements in the range [n, m) is just m-n (and not m-n+1).
  • The empty range is [n, n) (and not [n, n-1], which can be a problem if n is an iterator already pointing the first element of a list, or if n == 0).
  • For floats you can write [13, 42) (instead of [13, 41.999999999999]).
  • The +1 and -1 are almost never used, when handling ranges. This is an advantage if they are expensive (as it is for dates).
  • If you write a find in a range, the fact that there was nothing found can easily indicated by returning the end as the found position: if( find( [begin, end) ) == end) nothing found.
  • In languages, which start the array subscripts with 0 (like C, C++, JAVA, NCL) the upper bound is equal to the size.

Half-open versus closed ranges

Advantages of half-open ranges:

  • Empty ranges are valid: [0 .. 0]
  • Easy for subranges to go to the end of the original: [x .. $]
  • Easy to split ranges: [0 .. x] and [x .. $]

Advantages of closed ranges:

  • Symmetry.
  • Arguably easier to read.
  • ['a' ... 'z'] does not require awkward + 1 after 'z'.
  • [0 ... uint.max] is possible.

That last point is very interesting. It's really awkward to write an numberIsInRange(int n, int min, int max) predicate with a half-open range if Integer.MAX_VALUE could be legally in a range.

like image 20
polygenelubricants Avatar answered Sep 28 '22 02:09

polygenelubricants