The TL;DR version, for those who don't want the background, is the following specific question:
Why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?
Java has multidimensional arrays at the syntax level, in that one can declare
int[][] arr = new int[10][10];
but it seems that this is really not what one might have expected. Rather than having the JVM allocate a contiguous block of RAM big enough to store 100 int
s, it comes out as an array of arrays of int
s: so each layer is a contiguous block of RAM, but the thing as a whole is not. Accessing arr[i][j]
is thus rather slow: the JVM has to
int[]
stored at arr[i]
;int
stored at arr[i][j]
.This involves querying an object to go from one layer to the next, which is rather expensive.
At one level, it's not hard to see why this can't be optimised to a simple scale-and-add lookup even if it were all allocated in one fixed block. The problem is that arr[3]
is a reference all of its own, and it can be changed. So although arrays are of fixed size, we could easily write
arr[3] = new int[11];
and now the scale-and-add is screwed because this layer has grown. You'd need to know at runtime whether everything is still the same size as it used to be. In addition, of course, this will then get allocated somewhere else in RAM (it'll have to be, since it's bigger than what it's replacing), so it's not even in the right place for scale-and-add.
It seems to me that this is not ideal, and that for two reasons.
For one, it's slow. A test I ran with these methods for summing the contents of a single dimensional or multidimensional array took nearly twice as long (714 seconds vs 371 seconds) for the multidimensional case (an int[1000000]
and an int[100][100][100]
respectively, filled with random int
values, run 1000000 times with warm cache).
public static long sumSingle(int[] arr) { long total = 0; for (int i=0; i<arr.length; i++) total+=arr[i]; return total; } public static long sumMulti(int[][][] arr) { long total = 0; for (int i=0; i<arr.length; i++) for (int j=0; j<arr[0].length; j++) for (int k=0; k<arr[0][0].length; k++) total+=arr[i][j][k]; return total; }
Secondly, because it's slow, it thereby encourages obscure coding. If you encounter something performance-critical that would be naturally done with a multidimensional array, you have an incentive to write it as a flat array, even if that makes the unnatural and hard to read. You're left with an unpalatable choice: obscure code or slow code.
It seems to me that the basic problem could easily enough be fixed. The only reason, as we saw earlier, that it can't be optimised is that the structure might change. But Java already has a mechanism for making references unchangeable: declare them as final
.
Now, just declaring it with
final int[][] arr = new int[10][10];
isn't good enough because it's only arr
that is final
here: arr[3]
still isn't, and could be changed, so the structure might still change. But if we had a way of declaring things so that it was final
throughout, except at the bottom layer where the int
values are stored, then we'd have an entire immutable structure, and it could all be allocated as one block, and indexed with scale-and-add.
How it would look syntactically, I'm not sure (I'm not a language designer). Maybe
final int[final][] arr = new int[10][10];
although admittedly that looks a bit weird. This would mean: final
at the top layer; final
at the next layer; not final
at the bottom layer (else the int
values themselves would be immutable).
Finality throughout would enable the JIT compiler to optimise this to give performance to that of a single dimensional array, which would then take away the temptation to code that way just to get round the slowness of multidimensional arrays.
(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)
So why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?
A bizarre side note: the difference in timings drops away to only a few percent if you use an int
for the running total rather than a long
. Why would there be such a small difference with an int
, and such a big difference with a long
?
Code I used for benchmarking, in case anyone wants to try to reproduce these results:
public class Multidimensional { public static long sumSingle(final int[] arr) { long total = 0; for (int i=0; i<arr.length; i++) total+=arr[i]; return total; } public static long sumMulti(final int[][][] arr) { long total = 0; for (int i=0; i<arr.length; i++) for (int j=0; j<arr[0].length; j++) for (int k=0; k<arr[0][0].length; k++) total+=arr[i][j][k]; return total; } public static void main(String[] args) { final int iterations = 1000000; Random r = new Random(); int[] arr = new int[1000000]; for (int i=0; i<arr.length; i++) arr[i]=r.nextInt(); long total = 0; System.out.println(sumSingle(arr)); long time = System.nanoTime(); for (int i=0; i<iterations; i++) total = sumSingle(arr); time = System.nanoTime()-time; System.out.printf("Took %d ms for single dimension\n", time/1000000, total); int[][][] arrMulti = new int[100][100][100]; for (int i=0; i<arrMulti.length; i++) for (int j=0; j<arrMulti[i].length; j++) for (int k=0; k<arrMulti[i][j].length; k++) arrMulti[i][j][k]=r.nextInt(); System.out.println(sumMulti(arrMulti)); time = System.nanoTime(); for (int i=0; i<iterations; i++) total = sumMulti(arrMulti); time = System.nanoTime()-time; System.out.printf("Took %d ms for multi dimension\n", time/1000000, total); } }
No, Java does not support multi-dimensional arrays. Java supports arrays of arrays. In Java, a two-dimensional array is nothing but, an array of one-dimensional arrays.
There are basically two types of arrays in Java, i.e. one-dimensional and multi-dimensional arrays. 3D arrays fall under the category of multidimensional arrays. Multidimensional arrays, in simple words, can be defined as an array of arrays, and 3D arrays are an array of 2D arrays.
An ArrayList is backed by an array; however, the array expands when a certain number of elements are added to it. For this reason, the ArrayList could become jagged, and could be considered not to be multidimensional any longer.
In a multidimensional array, each element in each dimension has the same, fixed size as the other elements in that dimension. In a jagged array, which is an array of arrays, each inner array can be of a different size. By only using the space that's needed for a given array, no space is wasted.
but it seems that this is really not what one might have expected.
Why?
Consider that the form T[]
means "array of type T", then just as we would expect int[]
to mean "array of type int", we would expect int[][]
to mean "array of type array of type int", because there's no less reason for having int[]
as the T
than int
.
As such, considering that one can have arrays of any type, it follows just from the way [
and ]
are used in declaring and initialising arrays (and for that matter, {
, }
and ,
), that without some sort of special rule banning arrays of arrays, we get this sort of use "for free".
Now consider also that there are things we can do with jagged arrays that we can't do otherwise:
lookup[1]
is the same array as lookup[5]
. (This can allow for massive savings with some data-sets, e.g. many Unicode properties can be mapped for the full set of 1,112,064 code points in a small amount of memory because leaf arrays of properties can be repeated for ranges with matching patterns).There are certainly cases where these sort of multi-dimensional arrays are useful.
Now, the default state of any feature is unspecified and unimplemented. Someone needs to decide to specify and implement a feature, or else it wouldn't exist.
Since, as shown above, the array-of-array sort of multidimensional array will exist unless someone decided to introduce a special banning array-of-array feature. Since arrays of arrays are useful for the reasons above, that would be a strange decision to make.
Conversely, the sort of multidimensional array where an array has a defined rank that can be greater than 1 and so be used with a set of indices rather than a single index, does not follow naturally from what is already defined. Someone would need to:
Also users would have to learn this new feature.
So, it has to be worth it. Some things that would make it worth it would be:
In this case though:
Really, the question is not "why doesn't Java have true multidimensional arrays"? But "Why should it?"
Of course, the points you made in favour of multidimensional arrays are valid, and some languages do have them for that reason, but the burden is nonetheless to argue a feature in, not argue it out.
(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)
Like many rumours, there's an element of truth here, but it is not the full truth.
.NET arrays can indeed have multiple ranks. This is not the only way in which it is more flexible than Java. Each rank can also have a lower-bound other than zero. As such, you could for example have an array that goes from -3 to 42 or a two dimensional array where one rank goes from -2 to 5 and another from 57 to 100, or whatever.
C# does not give complete access to all of this from its built-in syntax (you need to call Array.CreateInstance()
for lower bounds other than zero), but it does for allow you to use the syntax int[,]
for a two-dimensional array of int
, int[,,]
for a three-dimensional array, and so on.
Now, the extra work involved in dealing with lower bounds other than zero adds a performance burden, and yet these cases are relatively uncommon. For that reason single-rank arrays with a lower-bound of 0 are treated as a special case with a more performant implementation. Indeed, they are internally a different sort of structure.
In .NET multi-dimensional arrays with lower bounds of zero are treated as multi-dimensional arrays whose lower bounds just happen to be zero (that is, as an example of the slower case) rather than the faster case being able to handle ranks greater than 1.
Of course, .NET could have had a fast-path case for zero-based multi-dimensional arrays, but then all the reasons for Java not having them apply and the fact that there's already one special case, and special cases suck, and then there would be two special cases and they would suck more. (As it is, one can have some issues with trying to assign a value of one type to a variable of the other type).
Not a single thing above shows clearly that Java couldn't possibly have had the sort of multi-dimensional array you talk of; it would have been a sensible enough decision, but so also the decision that was made was also sensible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With