Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't Java have true multidimensional arrays?

The TL;DR version, for those who don't want the background, is the following specific question:

Question

Why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?

Background

Java has multidimensional arrays at the syntax level, in that one can declare

int[][] arr = new int[10][10]; 

but it seems that this is really not what one might have expected. Rather than having the JVM allocate a contiguous block of RAM big enough to store 100 ints, it comes out as an array of arrays of ints: so each layer is a contiguous block of RAM, but the thing as a whole is not. Accessing arr[i][j] is thus rather slow: the JVM has to

  1. find the int[] stored at arr[i];
  2. index this to find the int stored at arr[i][j].

This involves querying an object to go from one layer to the next, which is rather expensive.

Why Java does this

At one level, it's not hard to see why this can't be optimised to a simple scale-and-add lookup even if it were all allocated in one fixed block. The problem is that arr[3] is a reference all of its own, and it can be changed. So although arrays are of fixed size, we could easily write

arr[3] = new int[11]; 

and now the scale-and-add is screwed because this layer has grown. You'd need to know at runtime whether everything is still the same size as it used to be. In addition, of course, this will then get allocated somewhere else in RAM (it'll have to be, since it's bigger than what it's replacing), so it's not even in the right place for scale-and-add.

What's problematic about it

It seems to me that this is not ideal, and that for two reasons.

For one, it's slow. A test I ran with these methods for summing the contents of a single dimensional or multidimensional array took nearly twice as long (714 seconds vs 371 seconds) for the multidimensional case (an int[1000000] and an int[100][100][100] respectively, filled with random int values, run 1000000 times with warm cache).

public static long sumSingle(int[] arr) {     long total = 0;     for (int i=0; i<arr.length; i++)         total+=arr[i];     return total; }  public static long sumMulti(int[][][] arr) {     long total = 0;     for (int i=0; i<arr.length; i++)         for (int j=0; j<arr[0].length; j++)             for (int k=0; k<arr[0][0].length; k++)                 total+=arr[i][j][k];     return total; }    

Secondly, because it's slow, it thereby encourages obscure coding. If you encounter something performance-critical that would be naturally done with a multidimensional array, you have an incentive to write it as a flat array, even if that makes the unnatural and hard to read. You're left with an unpalatable choice: obscure code or slow code.

What could be done about it

It seems to me that the basic problem could easily enough be fixed. The only reason, as we saw earlier, that it can't be optimised is that the structure might change. But Java already has a mechanism for making references unchangeable: declare them as final.

Now, just declaring it with

final int[][] arr = new int[10][10]; 

isn't good enough because it's only arr that is final here: arr[3] still isn't, and could be changed, so the structure might still change. But if we had a way of declaring things so that it was final throughout, except at the bottom layer where the int values are stored, then we'd have an entire immutable structure, and it could all be allocated as one block, and indexed with scale-and-add.

How it would look syntactically, I'm not sure (I'm not a language designer). Maybe

final int[final][] arr = new int[10][10]; 

although admittedly that looks a bit weird. This would mean: final at the top layer; final at the next layer; not final at the bottom layer (else the int values themselves would be immutable).

Finality throughout would enable the JIT compiler to optimise this to give performance to that of a single dimensional array, which would then take away the temptation to code that way just to get round the slowness of multidimensional arrays.

(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)

Question

So why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?

Update

A bizarre side note: the difference in timings drops away to only a few percent if you use an int for the running total rather than a long. Why would there be such a small difference with an int, and such a big difference with a long?

Benchmarking code

Code I used for benchmarking, in case anyone wants to try to reproduce these results:

public class Multidimensional {      public static long sumSingle(final int[] arr) {         long total = 0;         for (int i=0; i<arr.length; i++)             total+=arr[i];         return total;     }      public static long sumMulti(final int[][][] arr) {         long total = 0;         for (int i=0; i<arr.length; i++)             for (int j=0; j<arr[0].length; j++)                 for (int k=0; k<arr[0][0].length; k++)                     total+=arr[i][j][k];         return total;     }         public static void main(String[] args) {         final int iterations = 1000000;          Random r = new Random();         int[] arr = new int[1000000];         for (int i=0; i<arr.length; i++)             arr[i]=r.nextInt();         long total = 0;         System.out.println(sumSingle(arr));         long time = System.nanoTime();         for (int i=0; i<iterations; i++)             total = sumSingle(arr);         time = System.nanoTime()-time;         System.out.printf("Took %d ms for single dimension\n", time/1000000, total);          int[][][] arrMulti = new int[100][100][100];         for (int i=0; i<arrMulti.length; i++)             for (int j=0; j<arrMulti[i].length; j++)                 for (int k=0; k<arrMulti[i][j].length; k++)                     arrMulti[i][j][k]=r.nextInt();         System.out.println(sumMulti(arrMulti));         time = System.nanoTime();         for (int i=0; i<iterations; i++)             total = sumMulti(arrMulti);         time = System.nanoTime()-time;         System.out.printf("Took %d ms for multi dimension\n", time/1000000, total);     }  } 
like image 609
chiastic-security Avatar asked Oct 11 '14 19:10

chiastic-security


People also ask

Does Java support multi-dimensional arrays?

No, Java does not support multi-dimensional arrays. Java supports arrays of arrays. In Java, a two-dimensional array is nothing but, an array of one-dimensional arrays.

Are there three dimensional arrays in Java?

There are basically two types of arrays in Java, i.e. one-dimensional and multi-dimensional arrays. 3D arrays fall under the category of multidimensional arrays. Multidimensional arrays, in simple words, can be defined as an array of arrays, and 3D arrays are an array of 2D arrays.

Why is an ArrayList of ArrayList not multidimensional?

An ArrayList is backed by an array; however, the array expands when a certain number of elements are added to it. For this reason, the ArrayList could become jagged, and could be considered not to be multidimensional any longer.

What is difference between jagged array and multidimensional array in Java?

In a multidimensional array, each element in each dimension has the same, fixed size as the other elements in that dimension. In a jagged array, which is an array of arrays, each inner array can be of a different size. By only using the space that's needed for a given array, no space is wasted.


1 Answers

but it seems that this is really not what one might have expected.

Why?

Consider that the form T[] means "array of type T", then just as we would expect int[] to mean "array of type int", we would expect int[][] to mean "array of type array of type int", because there's no less reason for having int[] as the T than int.

As such, considering that one can have arrays of any type, it follows just from the way [ and ] are used in declaring and initialising arrays (and for that matter, {, } and ,), that without some sort of special rule banning arrays of arrays, we get this sort of use "for free".

Now consider also that there are things we can do with jagged arrays that we can't do otherwise:

  1. We can have "jagged" arrays where different inner arrays are of different sizes.
  2. We can have null arrays within the outer array where appropriate mapping of the data, or perhaps to allow lazy building.
  3. We can deliberately alias within the array so e.g. lookup[1] is the same array as lookup[5]. (This can allow for massive savings with some data-sets, e.g. many Unicode properties can be mapped for the full set of 1,112,064 code points in a small amount of memory because leaf arrays of properties can be repeated for ranges with matching patterns).
  4. Some heap implementations can handle the many smaller objects better than one large object in memory.

There are certainly cases where these sort of multi-dimensional arrays are useful.

Now, the default state of any feature is unspecified and unimplemented. Someone needs to decide to specify and implement a feature, or else it wouldn't exist.

Since, as shown above, the array-of-array sort of multidimensional array will exist unless someone decided to introduce a special banning array-of-array feature. Since arrays of arrays are useful for the reasons above, that would be a strange decision to make.

Conversely, the sort of multidimensional array where an array has a defined rank that can be greater than 1 and so be used with a set of indices rather than a single index, does not follow naturally from what is already defined. Someone would need to:

  1. Decide on the specification for the declaration, initialisation and use would work.
  2. Document it.
  3. Write the actual code to do this.
  4. Test the code to do this.
  5. Handle the bugs, edge-cases, reports of bugs that aren't actually bugs, backward-compatibility issues caused by fixing the bugs.

Also users would have to learn this new feature.

So, it has to be worth it. Some things that would make it worth it would be:

  1. If there was no way of doing the same thing.
  2. If the way of doing the same thing was strange or not well-known.
  3. People would expect it from similar contexts.
  4. Users can't provide similar functionality themselves.

In this case though:

  1. But there is.
  2. Using strides within arrays was already known to C and C++ programmers and Java built on its syntax so that the same techniques are directly applicable
  3. Java's syntax was based on C++, and C++ similarly only has direct support for multidimensional arrays as arrays-of-arrays. (Except when statically allocated, but that's not something that would have an analogy in Java where arrays are objects).
  4. One can easily write a class that wraps an array and details of stride-sizes and allows access via a set of indices.

Really, the question is not "why doesn't Java have true multidimensional arrays"? But "Why should it?"

Of course, the points you made in favour of multidimensional arrays are valid, and some languages do have them for that reason, but the burden is nonetheless to argue a feature in, not argue it out.

(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)

Like many rumours, there's an element of truth here, but it is not the full truth.

.NET arrays can indeed have multiple ranks. This is not the only way in which it is more flexible than Java. Each rank can also have a lower-bound other than zero. As such, you could for example have an array that goes from -3 to 42 or a two dimensional array where one rank goes from -2 to 5 and another from 57 to 100, or whatever.

C# does not give complete access to all of this from its built-in syntax (you need to call Array.CreateInstance() for lower bounds other than zero), but it does for allow you to use the syntax int[,] for a two-dimensional array of int, int[,,] for a three-dimensional array, and so on.

Now, the extra work involved in dealing with lower bounds other than zero adds a performance burden, and yet these cases are relatively uncommon. For that reason single-rank arrays with a lower-bound of 0 are treated as a special case with a more performant implementation. Indeed, they are internally a different sort of structure.

In .NET multi-dimensional arrays with lower bounds of zero are treated as multi-dimensional arrays whose lower bounds just happen to be zero (that is, as an example of the slower case) rather than the faster case being able to handle ranks greater than 1.

Of course, .NET could have had a fast-path case for zero-based multi-dimensional arrays, but then all the reasons for Java not having them apply and the fact that there's already one special case, and special cases suck, and then there would be two special cases and they would suck more. (As it is, one can have some issues with trying to assign a value of one type to a variable of the other type).

Not a single thing above shows clearly that Java couldn't possibly have had the sort of multi-dimensional array you talk of; it would have been a sensible enough decision, but so also the decision that was made was also sensible.

like image 89
Jon Hanna Avatar answered Oct 06 '22 11:10

Jon Hanna