I have two large (1000+ object) ArrayLists that I need to compare and manipulate. I essentially need to take a value from ArrayList A, look for a matching object in ArrayList B, then manipulate the object from B. I need to do this in all objects for A. I need to do this frequently in the application. Order is not known and sizes will differ.
(pseudocode)
ArrayList<myObject> A
ArrayList<myObject> B
I could loop through every single item in B looking for the one that matches the entity from A, for each entity in A. That just seems so inefficient.
(pseudocode)
for (each object in A){loop through all of B and find it}
Would it be worth converting B to a HashMap (using the specific value I am comparing as the key, and the object as the value), then searching B that way, then converting that temporary HashMap back to an ArrayList when I'm done processing?
(pseudocode)
convert B to HashMap<myObject.myValue,myObject> C
for (each object in A){look up the value in C}
convert C back to an ArrayList
Is this a good idea? Or is this premature/unnecessary optimization? Thank you.
(Background: Data comes to me from the service as an ArrayList - and the frontend needs an ArrayList for the view layer. I'm trying to make this middle tier processing more efficient - but the entry and exit objects must be ArrayList (or some other list) )
The ArrayList has O(n) performance for every search, so for n searches its performance is O(n^2). The HashMap has O(1) performance for every search (on average), so for n searches its performance will be O(n). While the HashMap will be slower at first and take more memory, it will be faster for large values of n.
ArrayList allows duplicate elements. HashMap allows duplicate values but does not allow duplicate keys. The ArrayList always gives O(1) performance in best case or worst-case time complexity. The HashMap get() method has O(1) time complexity in the best case and O(n) time complexity in worst case.
Yes, for large numbers, a HashMap
is beneficial.
Your initial algorithm will take a long time, looping through both lists in nested for loops. This is an O(n2) algorithm. Even assuming 1000 items each in A
and B
, and assuming a cost of 1 for comparing two individual items, one from A
and one from B
, you're looking at 500k comparisons (avoiding comparing each item twice). Doing this frequently will cause slow performance.
Assuming you have a good hash code algorithm for your objects, looking up a value from a HashMap
is O(1) access. You'll still spend O(n) time constructing it, but that's nothing compared to O(n2) if you have a large number of items.
Construct your HashMap
"C" once using data from "B" and use it many times, as long as B
's information hasn't changed. If you "need to do this frequently", then the performance will be even better because the HashMap
is already built -- just reuse it.
If you need to maintain the order, store the B
list index as the value in the hash map.
You don't need to be "converting that temporary hashmap back to an arraylist", because creating the HashMap
"C" doesn't destroy the original list "B". One thing to be careful of is if the objects in the list B
change, forcing updates to the HashMap
to keep consistent. Another thing to watch is your memory usage for very large lists -- can you keep the objects, the list, and the hashmap in memory?
Your pseudocode:
for each index in B:
get object b
put in hash map C values (b, index)
for each a in A:
if found in hash map C: do something with found object
For smaller numbers, the O(n2) performance times will be small enough that building the HashMap
won't be worth it. That is a decision you'll need to make -- you would need to decide when the lists are large enough that building a HashMap
and using it is worth the HashMap
construction cost.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With