Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does using different ArrayList constructors cause a different growth rate of the internal array?

I seem to stumble across something interesting in ArrayList implementation that I can't wrap my head around. Here is some code that shows what I mean:

public class Sandbox {

    private static final VarHandle VAR_HANDLE_ARRAY_LIST;

    static {
        try {
            Lookup lookupArrayList = MethodHandles.privateLookupIn(ArrayList.class, MethodHandles.lookup());
            VAR_HANDLE_ARRAY_LIST = lookupArrayList.findVarHandle(ArrayList.class, "elementData", Object[].class);
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException();
        }
    }

    public static void main(String[] args) {

        List<String> defaultConstructorList = new ArrayList<>();
        defaultConstructorList.add("one");

        Object[] elementData = (Object[]) VAR_HANDLE_ARRAY_LIST.get(defaultConstructorList);
        System.out.println(elementData.length);

        List<String> zeroConstructorList = new ArrayList<>(0);
        zeroConstructorList.add("one");

        elementData = (Object[]) VAR_HANDLE_ARRAY_LIST.get(zeroConstructorList);
        System.out.println(elementData.length);

    }
}

The idea is if you create an ArrayList like this:

List<String> defaultConstructorList = new ArrayList<>();
defaultConstructorList.add("one");

And look inside what the elementData (Object[] where all elements are kept) the it will report 10. Thus you add one element - you get 9 additional slots that are un-used.

If, on the other hand, you do:

List<String> zeroConstructorList = new ArrayList<>(0);
zeroConstructorList.add("one");

you add one element, space reserved is just for that element, nothing more.

Internally this is achieved via two fields:

/**
 * Shared empty array instance used for empty instances.
 */
private static final Object[] EMPTY_ELEMENTDATA = {};

/**
 * Shared empty array instance used for default sized empty instances. We
 * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
 * first element is added.
 */
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

When you create an ArrayList via new ArrayList(0) - EMPTY_ELEMENTDATA will be used.

When you create an ArrayList via new Arraylist() - DEFAULTCAPACITY_EMPTY_ELEMENTDATA is used.

The intuitive part from inside me - simply screams "remove DEFAULTCAPACITY_EMPTY_ELEMENTDATA" and let all the cases be handled with EMPTY_ELEMENTDATA; of course the code comment:

We distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when first element is added

does make sense, but why would one inflate to 10 (a lot more than I asked for) and the other one to 1 (exactly as much as I requested).


Even if you use List<String> zeroConstructorList = new ArrayList<>(0), and keep adding elements, eventually you will get to a point where elementData is bigger than the one requested:

    List<String> zeroConstructorList = new ArrayList<>(0);
    zeroConstructorList.add("one");
    zeroConstructorList.add("two");
    zeroConstructorList.add("three");
    zeroConstructorList.add("four");
    zeroConstructorList.add("five"); // elementData will report 6, though there are 5 elements only

But the rate at which it grows is smaller than the case of default constructor.


This reminds me about HashMap implementation, where the number of buckets is almost always more than you asked for; but there that is done because of the need for "power of two" buckets needed, not the case here though.

So the question is - can someone explain this difference to me?

like image 238
Eugene Avatar asked Jun 18 '19 10:06

Eugene


People also ask

How does ArrayList grow internally?

Internally an ArrayList uses an Object[] . As you add items to an ArrayList , the list checks to see if the backing array has room left. If there is room, the new item is just added at the next empty space. If there is not room, a new, larger, array is created, and the old array is copied into the new one.

How does the size of ArrayList grow dynamically?

The ArrayList size increases dynamically because whenever the ArrayList class requires to resize then it will create a new array of bigger size and copies all the elements from the old array to the new array. And now it is using the new array's reference for its internal usage.

Why are ArrayLists slower than array?

An Array is a collection of similar items. Whereas ArrayList can hold item of different types. An array is faster and that is because ArrayList uses a fixed amount of array.

What is the difference between array and ArrayList why ArrayList is better than array?

Array is strongly typed. This means that an array can store only specific type of items\elements. ArrayList can store any type of items\elements. No need to cast elements of an array while retrieving because it is strongly typed and stores a specific type of items only.


2 Answers

You get precisely what you asked for, respective what has been specified, even in older versions, where the implementation was different:

ArrayList()

Constructs an empty list with an initial capacity of ten.

ArrayList(int)

Constructs an empty list with the specified initial capacity.

So, constructing the ArrayList with the default constructor will give you an ArrayList with an initial capacity of ten, so as long as the list size is ten or smaller, no resize operation will ever be needed.

In contrast, the constructor with the int argument will precisely use the specified capacity, subject to the growing policy which is specified as

The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.

which applies even when you specify an initial capacity of zero.

Java 8 added the optimization that the creation of the ten elements array is postponed until the first element is added. This is specifically addressing the common case that ArrayList instances (created with the default capacity) stay empty for a long time or even their entire lifetime. Further, when the first actual operation is addAll, it might skip the first array resize operation. This does not affect lists with an explicit initial capacity, as those are usually chosen carefully.

As stated in this answer:

According to our performance analysis team, approximately 85% of ArrayList instances are created at default size so this optimization will be valid for an overwhelming majority of cases.

The motivation was to optimize precisely these scenarios, not to touch the specified default capacity, which was defined back when ArrayList was created. (Though JDK 1.4 is the first one specifying it explicitly)

like image 52
Holger Avatar answered Sep 17 '22 18:09

Holger


If you use the default constructor, the idea is to try to balance memory usage and reallocation. Hence a small default size (10) is used that should be fine for most applications.

If you use the constructor with an explicit size, it is assumed that you know what you're doing. If you initialize it with 0 you are essentially saying: I am pretty sure this will either stay empty or not grow beyond very few elements.

Now if you look at the implementations of ensureCapacityInternal in openjdk (link), you can see that only the first time you add an item, this difference comes into play:

private void ensureCapacityInternal(int minCapacity) {
    if (elementData == EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }

    ensureExplicitCapacity(minCapacity);
}

If the default constructor is used, the size grows to DEFAULT_CAPACITY (10). This is to prevent too many reallocations if multiple elements are added. However if you explicitly created this ArrayList with size 0, it will simply grow to size 1 on the first element you add. This is because you told it that you know what you're doing.

ensureExplicitCapacity basically just calls grow (with some range/overflow checks), so let's look at that:

private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

As you can see, it doesn't simply grow to a specific size, but it tries to be smart. The bigger the array is, the bigger it will grow even if minCapacity is just 1 bigger than the current capacity. The reasoning behind that is simple: The probability that a lof of items will be added is higher if the list is already big and vice versa. This is also why you see growth increments by 1 and then by 2 after the 5th element.

like image 31
Max Vollmer Avatar answered Sep 20 '22 18:09

Max Vollmer