Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Defining a set of ground-rules for high-performance data structures (java)

I generally use vectors/arraylists , hashmaps/treemaps, and other java collections interchangeably, with exception of the fact that there are sometimes functional API requirements (for example, I might need a sorted data set in certain instances).

Lately, however, I've found a need to push java performance to the limit for some algorithms I'm running.

Is there a set of guidelines for high-performance data structures, that I can use as ground rules for my coding ?

I'm looking for general rules, but, in this context, answers to the following questions might also be very helpful :

1) When should I use multidimensional arrays instead of nested Collections ?

2) Vectors vs. ArrayLists - is there truly a performance difference ?

3) Do collection API's like Google's collections, java tricks (like reflection and casting), and other common java developer idioms tend to slow down the JVM when it is under heavy load ?

4) Do primitives vs regular objects (i.e. Double vs double) slow down the JVM when doing lots of calculations ?

5) Are there other important guidelines for dealing with large collections in java programs which need to be high-performance ?

  • Note : at this point, I'm not doing any multithreading... I realize that there are other constraints which might apply once I start parallelizing .
like image 437
jayunit100 Avatar asked Nov 17 '11 20:11

jayunit100


People also ask

How do you define a data structure in Java?

What is a Data Structure in Java? The term data structure refers to a data collection with well-defined operations and behavior or properties. A data structure is a unique way of storing or organizing the data in computer memory so that we can use it effectively.

What data structure would be most useful for counting various similar elements in a set?

There are different data structures based on hashing, but the most commonly used data structure is the hash table. Hash tables are generally implemented using arrays.

What are the 2 main types of data structures?

Basically, data structures are divided into two categories: Linear data structure. Non-linear data structure.


2 Answers

All performance issues should be addressed first by profiling (for both time and memory/object use). Don't optimize things that aren't a factor in the performance of your code. With that caveat, there are some general rules of thumb (that should all be tested by profiling!)

1) When should I use multidimensional arrays instead of nested Collections ?

When you don't need the dynamic sizing of Collections and don't need to feed your data to anything that requires a Collection, then multidimensional arrays (arrays of arrays, actually) can be a faster.

2) Vectors vs. ArrayLists - is there truly a performance difference ?

Yes. Many methods in Vector are synchronized, which is expensive. If you aren't multithreading, then avoid Vector. Even if you are, the granularity of the synchronization is usually wrong and you're better off providing thread safety yourself.

3) Do collection API's like Google's collections, java tricks (like reflection and casting), and other common java developer idioms tend to slow down the JVM when it is under heavy load ?

Reflection is slow; garbage collection is slow. Anything you can do to avoid those will speed things up.

4) Do primitives vs regular objects (i.e. Double vs double) slow down the JVM when doing lots of calculations ?

Yes. Autoboxing/unboxing can create huge amounts of garbage very quickly. This all has to be collected, which will also slow down your program.

5) Are there other important guidelines for dealing with large collections in java programs which need to be high-performance ?

Prefer local method variables to field accesses. You can find many other guidelines by searching the web. The main thing, though, is to profile.

Edit: There's a good collection of performance hints here.

like image 132
Ted Hopp Avatar answered Nov 04 '22 10:11

Ted Hopp


To answer your 4) Yes, Double vs double definitely changes the performances

When you have collections made up of primitives you certainly can use collections backed by primitives, like the very good Trove API. By avoiding constant primitive-to-object and vice-versa (un)boxing you save both memory and precious time.

Also the Vector class is, by now, pretty much a thing of the past.

like image 45
TacticalCoder Avatar answered Nov 04 '22 08:11

TacticalCoder