Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I safely copy collections?

In the past, I've said to safely copy a collection do something like:

public static void doThing(List<String> strs) {
    List<String> newStrs = new ArrayList<>(strs);

or

public static void doThing(NavigableSet<String> strs) {
    NavigableSet<String> newStrs = new TreeSet<>(strs);

But are these "copy" constructors, similar static creation methods and streams, really safe and where are the rules specified? By safe I mean are the basic semantic integrity guarantees offered by the Java language and collections enforced against a malicious caller, assuming backed up by a reasonable SecurityManager and that there are no flaws.

I'm happy with the method throwing ConcurrentModificationException, NullPointerException, IllegalArgumentException, ClassCastException, etc., or perhaps even hanging.

I have chosen String as an example of an immutable type argument. For this question, I'm not interested in deep copies for collections of mutable types which has its own gotchas.

(To be clear, I have looked at the OpenJDK source code and have some kind of answer for ArrayList and TreeSet.)

like image 442
Tom Hawtin - tackline Avatar asked Mar 09 '20 13:03

Tom Hawtin - tackline


People also ask

Is collections copy deep copy?

Collections. copy() does not perform a deep copy. It simply copies elements from one collection to another.

How will you copy elements from a source list to another list?

Constructor A simple way to copy a List is by using the constructor that takes a collection as its argument: List<Plant> copy = new ArrayList<>(list); Since we're copying references here, and not cloning the objects, every amends made in one element will affect both lists.

How do I return a copy of a list in Java?

ArrayList clone() method in Java with Examples ArrayList. clone() method is used to create a shallow copy of the mentioned array list. It just creates a copy of the list.


1 Answers

There’s no real protection against intentionally malicious code running within the same JVM in ordinary APIs, like the Collection API.

As can easily be demonstrated:

public static void main(String[] args) throws InterruptedException {
    Object[] array = { "foo", "bar", "baz", "and", "another", "string" };
    array[array.length - 1] = new Object() {
        @Override
        public String toString() {
            Collections.shuffle(Arrays.asList(array));
            return "string";
        }
    };
    doThing(new ArrayList<String>() {
        @Override public Object[] toArray() {
            return array;
        }
    });
}

public static void doThing(List<String> strs) {
    List<String> newStrs = new ArrayList<>(strs);

    System.out.println("made a safe copy " + newStrs);
    for(int i = 0; i < 10; i++) {
        System.out.println(newStrs);
    }
}
made a safe copy [foo, bar, baz, and, another, string]
[bar, and, string, string, another, foo]
[and, baz, bar, string, string, string]
[another, baz, and, foo, bar, string]
[another, bar, and, foo, string, and]
[another, baz, string, another, and, foo]
[string, and, another, foo, string, foo]
[baz, string, foo, and, baz, string]
[bar, another, string, and, another, baz]
[bar, string, foo, string, baz, and]
[bar, string, bar, another, and, foo]

As you can see, expecting a List<String> doesn’t guaranty to actually get a list of String instances. Due to type erasure and raw types, there isn’t even a fix possible on the list implementation side.

The other thing, you can blame ArrayList’s constructor for, is the trust in the incoming collection’s toArray implementation. TreeMap isn’t affected in the same way, but only because there is no such performance gain from passing the array, as in the construction of an ArrayList. Neither class guarantees a protection in the constructor.

Normally, there is no point in trying to write code assuming intentionally malicious code around every corner. There’s too much it can do, to protect against everything. Such protection is only useful for code which does really encapsulate an action which could give a malicious caller access to something, it couldn’t already access without this code.

If you need safety for a particular code, use

public static void doThing(List<String> strs) {
    String[] content = strs.toArray(new String[0]);
    List<String> newStrs = new ArrayList<>(Arrays.asList(content));

    System.out.println("made a safe copy " + newStrs);
    for(int i = 0; i < 10; i++) {
        System.out.println(newStrs);
    }
}

Then, you can be sure that newStrs does only contain strings and can’t get modified by other code after its construction.

Or use List<String> newStrs = List.of(strs.toArray(new String[0])); with Java 9 or newer
Note that Java 10’s List.copyOf(strs) does the same, but its documentation doesn’t state that it is guaranteed not to trust the incoming collection’s toArray method. So calling List.of(…), which will definitely make a copy in case it returns an array based list, is safer.

Since no caller can alter the way, arrays work, dumping the incoming collection into an array, followed by populating the new collection with it, will always make the copy safe. Since the collection can hold a reference to the returned array as demonstrated above, it could alter it during the copy phase, but it can’t affect the copy in the collection.

So any consistency checks should be done after the particular element has been retrieved from the array or on the resulting collection as a whole.

like image 113
Holger Avatar answered Oct 22 '22 14:10

Holger