Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is more efficient : using removeAll() or using the following HashMap technique to retain only changed records in an ArrayList

I have 2 ArrayLists A and B of the same datastructure C (hashCode() and equals() overridden). C represents a student's record. The two lists are of the same size and represent new student records and old ones respectively (the students are the same in both the lists, ordering might be different). I wish to keep only those records in A that have been changed. As such, I do :

 A.removeAll(B)

As per the javadocs, this would take each record of A and compare with each record of B, and if it finds both equal, it will remove the record from A. If a record of A is not found to be equal to any record in B, and since all students in A are also in B, it means that that record of A has changed. The problem is that its easily of n square complexity.

Another approach can be :

Map<C> map = new HashMap<C>();
for (C record : B){
    map.add(record.getStudentId(),record);
}
List<C> changedRecords = new ArrayList<C>();
for (C record : A){
    if (record.equals(map.get(record.getStudentId())){
        changedRecords.add(record);
    }
}

I think this might be of a lower complexity than the above solution. Is that correct ?

like image 458
Daud Avatar asked Apr 03 '12 07:04

Daud


2 Answers

Yes the latter algorithm is better than O(n^2), since you have two loops, one ranging over B and another over A and you do (amortized) constant work in each loop, your new solution runs in O(|A| + |B|).

I suspect that you don't have any duplicate entries though. If this is the case, you could also go via a HashSet (change to LinkedHashSet if you want to preserve the order in A):

HashSet<C> tmp = new HashSet<C>(A);
tmp.removeAll(B);                     // Linear operation
A = new ArrayList<C>(tmp);

(Or if order doesn't matter to you, you could use HashSets all the way through.)


As pointed out by @Daud in the comments below, HashSet.removeAll(Collection c) actually calls c.contains repeatedly if the size of the hash set is smaller than the collection which affects the complexity (at least in OpenJDK). This is because the implementation always chooses to iterate over the smaller collection.

like image 112
aioobe Avatar answered Sep 20 '22 13:09

aioobe


What you may save on complexity you could be losing in memory allocation so isn't necessarily more efficient. Arrraylist uses something similar to an in place partitioning algorithm to run down the backing array and test against the compare.

When comparing it simply looks to find the index of the first occurrence of a match against the backing array Object[]. The algorithm maintains two indexes, one to iterate through the backing array and one as a placeholder for matches. In the case of a match it simply moves the index on the backing array and carries on to the next incoming element; this is relatively cheap.

If it comes to a point where it finds that the incoming collection doesn't contain the value at the current index in the backing array it simply overwrites the element where the last match occurred with the element at the current index without incurring a new memory allocation. This pattern repeats until all elements in the ArrayList have been compared against the incoming collection, hence the complexity you are concerned about.

For example: Consider an arraylist A with 1,2,4,5 and a collection 'C' with 4,1 that we match against; wanting to remove 4 and 1. here is each iteration on the for loop that would go 0 -> 4

Iteration: r is the for loop index on arraylist a for (; r < size; r++)

r = 0 (does C contain 1? Yes, skip to the next one) A: 1,2,4,5 w = 0

r = 1 (Does C contain 2? No, copy the value at r into the spot pointed to by w++) A: 2,2,4,5 w=1

r = 2 (Does C contain 4?, Yes skip) A: 2,2,4,5 w=1

r = 3 (Does C contain 5? No, copy the value at r into the spot pointed to by w++)

A: 2,5,4,5 w=2

r=4, stop

Compare w to the size of the backing array which is 4. Since they are not equal Null out the values from w on to the end of the array and reset the size.

A: 2,5 size of 2

The built in removeAll also considers that ArrayLists can contain null. You could throw an NPE on record.getStudentId() in your solution above. Finally, removeAll protects against exceptions in the compare on Collection.contains. if that happens it uses finally to do a native memcopy which protects the backing array from corruption in a highly efficient manner.

like image 25
TechTrip Avatar answered Sep 22 '22 13:09

TechTrip