Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collections emptyList/singleton/singletonList/List/Set toArray

Suppose I have this code:

String[] left = { "1", "2" };
String[] leftNew = Collections.emptyList().toArray(left);
System.out.println(Arrays.toString(leftNew));

This will print [null, 2]. This sort of makes sense, since we have an empty list it is somehow suppose to cope with the fact that we are passing an array that is bigger and sets the first element to null. This is probably saying that the first element does not exist in the empty list, thus it set to null.

But this is still confusing, since we pass an array with a certain type only to help infer the type of the returned array; but anyway this is something that has at least a certain logic. But what if I do:

String[] right = { "nonA", "b", "c" };
// or Collections.singletonList("a");
// or a plain List or Set; does not matter
String[] rightNew = Collections.singleton("a").toArray(right);
System.out.println(Arrays.toString(rightNew));

Taking the previous example as a reference, I would expect this one to show:

["a", "b", "c"]

But, a bit un-expected for me, it prints:

[a, null, c]

And, of course, I go to the documentation that explicitly says this is expected:

If this set fits in the specified array with room to spare (i.e., the array has more elements than this set), the element in the array immediately following the end of the set is set to null.

OK, good, this is at least documented. But it later says:

This is useful in determining the length of this set only if the caller knows that this set does not contain any null elements.

This is the part in the documentation that confuses me the most :|

And an even funner example that makes little sense to me:

String[] middle = { "nonZ", "y", "u", "m" };
List<String> list = new ArrayList<>();
list.add("z");
list.add(null);
list.add("z1");
System.out.println(list.size()); // 3

String[] middleNew = list.toArray(middle);
System.out.println(Arrays.toString(middleNew));

This will print:

[z, null, z1, null]

So it clears the last element from the array, but why it would not do that in the first example?

Can someone shed some light here?

like image 470
Eugene Avatar asked Aug 17 '18 20:08

Eugene


3 Answers

The <T> T[] toArray(T[] a) method on Collection is weird, because it's trying to fulfill two purposes at once.

First, let's look at toArray(). This takes the elements from the collection and returns them in an Object[]. That is, the component type of the returned array is always Object. That's useful, but it doesn't satisfy a couple other use cases:

1) The caller wants to re-use an existing array, if possible; and

2) The caller wants to specify the component type of the returned array.

Handling case (1) turns out to be a fairly subtle API problem. The caller wants to re-use an array, so it clearly needs to be passed in. Unlike the no-arg toArray() method, which returns an array of the right size, if the caller's array is re-used, we need to a way to return the number of elements copied. OK, let's have an API that looks like this:

int toArray(T[] a)

The caller passes in an array, which is reused, and the return value is the number of elements copied into it. The array doesn't need to be returned, because the caller already has a reference to it. But what if the array is too small? Well, maybe throw an exception. In fact, that's what Vector.copyInto does.

void copyInto​(Object[] anArray)

This is a terrible API. Not only does it not return the number of elements copied, it throws IndexOutOfBoundsException if the destination array is too short. Since Vector is a concurrent collection, the size might change at any time before the call, so the caller cannot guarantee that the destination array is of sufficient size, nor can it know the number of elements copied. The only thing the caller can do is to lock the Vector around the entire sequence:

synchronized (vec) {
    Object[] a = new Object[vec.size()];
    vec.copyInto(a);
}

Ugh!

The Collections.toArray(T[]) API avoids this problem by having different behavior if the destination array is too small. Instead of throwing an exception like Vector.copyInto(), it allocates a new array of the right size. This trades away the array-reuse case for more reliable operation. The problem is now that caller can't tell whether its array was reused or a new one was allocated. Thus, the return value of toArray(T[]) needs to return an array: the argument array, if it was large enough, or the newly allocated array.

But now we have another problem. We no longer have a way to tell the caller the number of elements that were copied from the collection into the array. If the destination array was newly allocated, or the array happens to be exactly the right size, then the length of the array is the number of elements copied. If the destination array is larger than the number of elements copied, the method attempts to communicate to the caller the number of elements copied, by writing a null to the array location one beyond the last element copied from the collection. If it's known that the source collection has no null values, this enables the caller to determine the number of elements copied. After the call, the caller can search for the first null value in the array. If there is one, its position determines the number of elements copied. If there is no null in the array, it knows that the number of elements copied equals the length of the array.

Quite frankly, this is pretty lame. However, given the constraints on the language at the time, I admit I don't have a better alternative.

I don't think I've ever seen any code that reuses arrays or that checks for nulls this way. This is probably a holdover from the early days when memory allocation and garbage collection were expensive, so people wanted to reuse memory as much as possible. More recently, the accepted idiom for using this method has been the second use case described above, that is, to establish the desired component type of the array as follows:

MyType[] a = coll.toArray(new MyType[0]);

(It seems wasteful to allocate a zero-length array for this purpose, but it turns out that this allocation can be optimized away by the JIT compiler, and the obvious alternative toArray(new MyType[coll.size()]) is actually slower. This is because of the need to initialize the array to nulls, and then to fill it in with the collection's contents. See Alexey Shipilev's article on this topic, Arrays of Wisdom of the Ancients.)

However, many people find the zero-length array counterintuitive. In JDK 11, there is a new API that allows one to use an array constructor reference instead:

MyType[] a = coll.toArray(MyType[]::new);

This lets the caller specify the component type of the array, but it lets the collection provide the size information.

like image 89
Stuart Marks Avatar answered Oct 10 '22 09:10

Stuart Marks


It will only clear the element in the index right after the last element in the original list, so in the first example the list is empty, hence it nullifies the element at index zero (the first element which is "1").

In your last example, it just happens that the last element is the one right after the last element in the original list. Knowing that the last scenario wouldn't really help in determining the size of the list because it did allowed null values.

But if the list did not allow null (e.g. immutable lists introduced in Java 9), then this is useful because in case you're looping over the returned array, you would not want to process the extra elements, in which case you can stop the iterator at the first null element.

like image 3
M A Avatar answered Oct 10 '22 11:10

M A


From the JDK 9 source code for ArrayList:

@SuppressWarnings("unchecked")
public <T> T[] toArray(T[] a) {
    if (a.length < size)
        // Make a new array of a's runtime type, but my contents:
        return (T[]) Arrays.copyOf(elementData, size, a.getClass());
    System.arraycopy(elementData, 0, a, 0, size);
    if (a.length > size)
        a[size] = null;
    return a;
}

and in Arrays.ArrayList, the List implementation returned by Arrays.asList:

@Override
@SuppressWarnings("unchecked")
public <T> T[] toArray(T[] a) {
    int size = size();
    if (a.length < size)
        return Arrays.copyOf(this.a, size,
                             (Class<? extends T[]>) a.getClass());
    System.arraycopy(this.a, 0, a, 0, size);
    if (a.length > size)
        a[size] = null;
    return a;
}

If the size of the list to be converted to an array is size, then they both set a[size] to null.

With an empty list, size is 0 so a[0] is set to null, and the other elements are not touched.

With a singleton list, size is 1 so a[1] is set to null, and the other elements are not touched.

If the size of the list is one less than the length of the array, a[size] refers to the last element of the array, so it is set to null. In your example, you have a null in the second position (index 1), so that is set to null as an element. If someone were looking for null to count elements, they would stop here instead of the other null, which is the null resulting from setting the next element beyond the list's contents to null. These nulls can't be told apart.

like image 2
rgettman Avatar answered Oct 10 '22 10:10

rgettman