Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of Collection class in Java

All,

I have been going through a lot of sites that post about the performance of various Collection classes for various actions i.e. adding an element, searching and deleting. But I also notice that all of them provide different environments in which the test was conducted i.e. O.S, memory, threads running etc.

My question is, if there is any site/material that provides the same performance information on best test environment basis? i.e. the configurations should not be an issue or catalyst for poor performance of any specific data structure.

[Updated]: Example, HashSet and LinkedHashSet both have a complexity of O (1) for inserting an element. However, Bruce Eckel' test claims that insertion is going to take more time for LinkedHashSet than for HashSet [http://www.artima.com/weblogs/viewpost.jsp?thread=122295]. So should I still go by the Big-Oh notation ?

like image 744
name_masked Avatar asked Oct 19 '10 22:10

name_masked


People also ask

What collection class is good for performance?

Performance of Collection Classes If you are not interested in how objects will be organized in a collection, then the only other consideration is performance. In that case, use the ArrayList class. It is fast and makes efficient use of memory.

Which collection class is faster in Java?

If you need fast access to elements using index, ArrayList should be choice. If you need fast access to elements using a key, use HashMap. If you need fast add and removal of elements, use LinkedList (but it has a very poor seeking performance).

Are Collections faster than arrays?

In this article we can discover the differences in use of arrays and collections. At the first part we can observe that arrays occupy less memory and are constructed faster than collections.

What are the advantages of Collections in Java?

Benefits of the Java Collections FrameworkReduces programming effort: By providing useful data structures and algorithms, the Collections Framework frees you to concentrate on the important parts of your program rather than on the low-level "plumbing" required to make it work.


3 Answers

Here are my recommendations:

  1. First of all, don't optimize :) Not that I am telling you to design crap software, but just to focus on design and code quality more than premature optimization. Assuming you've done that, and now you really need to worry about which collection is best beyond purely conceptual reasons, let's move on to point 2
  2. Really, don't optimize yet (roughly stolen from M. A. Jackson)
  3. Fine. So your problem is that even though you have theoretical time complexity formulas for best cases, worst cases and average cases, you've noticed that people say different things and that practical settings are a very different thing from theory. So run your own benchmarks! You can only read so much, and while you do that your code doesn't write itself. Once you're done with the theory, write your own benchmark - for your real-life application, not some irrelevant mini-application for testing purposes - and see what actually happens to your software and why. Then pick the best algorithm. It's empirical, it could be regarded as a waste of time, but it's the only way that actually works flawlessly (until you reach the next point).
  4. Now that you've done that, you have the fastest app ever. Until the next update of the JVM. Or of some underlying component of the operating system your particular performance bottleneck depends on. Guess what? Maybe your clients have different ones. Here comes the fun: you need to be sure that your benchmark is valid for others or in most cases (or have fun writing code for different cases). You need to collect data from users. LOTS. And then you need to do that over and over again to see what happens and if it still holds true. And then re-write your code accordingly over and over again (The - now terminated - Engineering Windows 7 blog is actually a nice example of how user data collection helps to make educated decisions to improve user experience.

Or you can... you know... NOT optimize. Platforms and compilers will change, but a good design should - on average - perform well enough.

Other things you can also do:

  • Have a look at the JVM's source code. It's very educative and you discover a herd of hidden things (I'm not saying that you have to use them...)
  • See that other thing on your TODO list that you need to work on? Yes, the one near the top but that you always skip because it's too hard or not fun enough. That one right there. Well get to it and leave the optimization thingy alone: it's the evil child of a Pandora's Box and a Moebius band. You'll never get out of it, and you'll deeply regret you tried to have your way with it.

That being said, I don't know why you need the performance boost so maybe you have a very valid reason.

And I am not saying that picking the right collection doesn't matter. Just that ones you know which one to pick for a particular problem, and that you've looked at alternatives, then you've done your job without having to feel guilty. The collections have usually a semantic meaning, and as long as you respect it you'll be fine.

like image 54
haylem Avatar answered Oct 03 '22 15:10

haylem


In my opinion, all you need to know about a data structure is the Big-O of the operations on it, not subjective measures from different architectures. Different collections serve different purposes.

Maps are dictionaries
Sets assert uniqueness
Lists provide grouping and preserve iteration order
Trees provide cheap ordering and quick searches on dynamically changing contents that require constant ordering

Edited to include bwawok's statement on the use case of tree structures

Update
From the javadoc on LinkedHashSet

Hash table and linked list implementation of the Set interface, with predictable iteration order.

...

Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.

Now we have moved from the very general case of choosing an appropriate data-structure interface to the more specific case of which implementation to use. However, we still ultimately arrive at the conclusion that specific implementations are well suited for specific applications based on the unique, subtle invariant offered by each implementation.

like image 29
Tim Bender Avatar answered Oct 03 '22 13:10

Tim Bender


What do you need to know about them, and why? The reason that benchmarks show a given JDK and hardware setup is so that they could (in theory) be reproduced. What you should get from benchmarks is an idea of how things will work. For an ABSOLUTE number, you will need to run it vs your own code doing your own thing.

The most important thing to know is the Big O runtime of various collections. Knowing that getting an element out of an unsorted ArrayList is O(n), but getting it out of a HashMap is O(1) is HUGE.

If you are already using the correct collection for a given job, you are 90% of the way there. The times when you need to worry about how fast you can, say, get items out of a HashMap should be pretty darn rare.

Once you leave single threaded land and move into multi-threaded land, you will need to start worrying about things like ConcurrentHashMap vs Collections.synchronized hashmap. Until you are multi threaded, you can just not worry about this kind of stuff and focus on which collection for which use.

Update to HashSet vs LinkedHashSet

I haven't ever found a use case where I needed a Linked Hash Set (because if I care about order I tend to have a List, if I care about O(1) gets, I tend to use a HashSet. Realistically, most code will use ArrayList, HashMap, or HashSet. If you need anything else, you are in a "edge" case.

like image 28
bwawok Avatar answered Oct 03 '22 15:10

bwawok