I generally use vectors/arraylists , hashmaps/treemaps, and other java collections interchangeably, with exception of the fact that there are sometimes functional API requirements (for example, I might need a sorted data set in certain instances).
Lately, however, I've found a need to push java performance to the limit for some algorithms I'm running.
Is there a set of guidelines for high-performance data structures, that I can use as ground rules for my coding ?
I'm looking for general rules, but, in this context, answers to the following questions might also be very helpful :
1) When should I use multidimensional arrays instead of nested Collections ?
2) Vectors vs. ArrayLists - is there truly a performance difference ?
3) Do collection API's like Google's collections, java tricks (like reflection and casting), and other common java developer idioms tend to slow down the JVM when it is under heavy load ?
4) Do primitives vs regular objects (i.e. Double vs double) slow down the JVM when doing lots of calculations ?
5) Are there other important guidelines for dealing with large collections in java programs which need to be high-performance ?
What is a Data Structure in Java? The term data structure refers to a data collection with well-defined operations and behavior or properties. A data structure is a unique way of storing or organizing the data in computer memory so that we can use it effectively.
There are different data structures based on hashing, but the most commonly used data structure is the hash table. Hash tables are generally implemented using arrays.
Basically, data structures are divided into two categories: Linear data structure. Non-linear data structure.
All performance issues should be addressed first by profiling (for both time and memory/object use). Don't optimize things that aren't a factor in the performance of your code. With that caveat, there are some general rules of thumb (that should all be tested by profiling!)
1) When should I use multidimensional arrays instead of nested Collections ?
When you don't need the dynamic sizing of Collections and don't need to feed your data to anything that requires a Collection, then multidimensional arrays (arrays of arrays, actually) can be a faster.
2) Vectors vs. ArrayLists - is there truly a performance difference ?
Yes. Many methods in Vector are synchronized, which is expensive. If you aren't multithreading, then avoid Vector. Even if you are, the granularity of the synchronization is usually wrong and you're better off providing thread safety yourself.
3) Do collection API's like Google's collections, java tricks (like reflection and casting), and other common java developer idioms tend to slow down the JVM when it is under heavy load ?
Reflection is slow; garbage collection is slow. Anything you can do to avoid those will speed things up.
4) Do primitives vs regular objects (i.e. Double vs double) slow down the JVM when doing lots of calculations ?
Yes. Autoboxing/unboxing can create huge amounts of garbage very quickly. This all has to be collected, which will also slow down your program.
5) Are there other important guidelines for dealing with large collections in java programs which need to be high-performance ?
Prefer local method variables to field accesses. You can find many other guidelines by searching the web. The main thing, though, is to profile.
Edit: There's a good collection of performance hints here.
To answer your 4) Yes, Double vs double definitely changes the performances
When you have collections made up of primitives you certainly can use collections backed by primitives, like the very good Trove API. By avoiding constant primitive-to-object and vice-versa (un)boxing you save both memory and precious time.
Also the Vector class is, by now, pretty much a thing of the past.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With