array of structures, or structure of arrays?

Tags:

data-structures

Hmmm. I have a table which is an array of structures I need to store in Java. The naive don't-worry-about-memory approach says do this:

public class Record {
  final private int field1;
  final private int field2;
  final private long field3;
  /* constructor & accessors here */
}

List<Record> records = new ArrayList<Record>();

If I end up using a large number (> 10⁶ ) of records, where individual records are accessed occasionally, one at a time, how would I figure out how the preceding approach (an ArrayList) would compare with an optimized approach for storage costs:

public class OptimizedRecordStore {
  final private int[] field1;
  final private int[] field2;
  final private long[] field3;

  Record getRecord(int i) { return new Record(field1[i],field2[i],field3[i]); }
  /* constructor and other accessors & methods */
}

edit:

assume the # of records is something that is changed infrequently or never
I'm probably not going to use the OptimizedRecordStore approach, but I want to understand the storage cost issue so I can make that decision with confidence.
obviously if I add/change the # of records in the OptimizedRecordStore approach above, I either have to replace the whole object with a new one, or remove the "final" keyword.
kd304 brings up a good point that was in the back of my mind. In other situations similar to this, I need column access on the records, e.g. if field1 and field2 are "time" and "position", and it's important for me to get those values as an array for use with MATLAB, so I can graph/analyze them efficiently.

220

asked Jul 14 '09 14:07

Jason S

3 Answers

The answers that give the general "optimise when you have to" is unhelpful in this case because , IMHO, programmers should always be aware of the performance in different in design choices when that choice leads to an order of magnitude performance penalty, particularly API writers.

The original question is quite valid and I would tend to agree that the second approach is better, given his particular situation. I've written image processing code where each pixel requires a data structure, a situation not too dissimilar to this, except I needed frequent random access to each pixel. The overhead of creating one object for each pixel was enormous.

184

answered Oct 11 '22 18:10

flux

The second version is much, much worse. Instead of resizing one array, you're resizing three arrays when you do an insert or delete. What's more, the second version will lead to the creation of many more temporary objects and it will do so on accesses. That could lead to a lot of garbage (from a GC point of view). Not good.

Generally speaking, you should worry about how you use the objects long before you think about performance. So you have a record with three fields or three arrays. Which one more accurately depicts what you're modeling? By this I mean, when you insert or delete an item, are you doing one of the three arrays or all three as a block?

I suspect it's the latter in which case the former makes far more sense.

If you're really concerned about insertion/deletion performance then perhaps a different data structure is appropriate, perhaps a SortedSet or a Map or SortedMap.

answered Oct 11 '22 19:10

cletus

If you have millions of records, the second approach has several advantages:

Memory usage: the first approach uses more memory because a) every Java object in heap has a header (containing class id, lock state etc.); b) objects are aligned in memory; c) each reference to an object costs 4 bytes (on 64-bit JVMs with Compressed OOPs or 32-bit JVMs) or 8 bytes (64-bit JVMs without Compressed OOPs). See e. g. CompressedOops for more details. So the first approach takes about two times more memory (more precisely: according to my benchmark, an object with 16 bytes of payload + a reference to it took 28 bytes on 32-bit Java 7, 36 bytes on 64-bit Java 7 with compressed OOPs, and 40 bytes on 64-bit Java 7 w/o compressed OOPs).
Garbage collection: although the second approach seems to create many objects (one on each call of getRecord), it might not be so, as modern server JVMs (e. g. Oracle's Java 7) can apply escape analysis and stack allocation to avoid heap allocation of temporary objects in some cases; anyway, GCing short-lived objects is cheap. On the other hand, it is probably easier for the garbage collector if there are not millions of long-lived objects (as there are in the first approach) whose reachability to check (or at least, such objects may make your application need more careful tuning of GC generation sizes). Thus the second approach may be better for GC performance. However, to see whether it makes a difference in the real situation, one should make a benchmark oneself.
Serialization speed: the speed of (de)serializing a large array of primitives on disk is only limited by HDD speed; serializing many small objects is inevitably slower (especially if you use Java's default serialization).

Therefore I have used the second approach quite often for very large collections. But of course, if you have enough memory and don't care about serialization, the first approach is simpler.

answered Oct 11 '22 18:10

Jaan

Related questions
                            
                                What if a finalizer makes an object reachable?
                            
                                Can't get Spring MVC dispatcher to work properly when url pattern is a path
                            
                                PermGen space issue with Glassfish/Hibernate
                            
                                Return number of rows in 2d array?
                            
                                How cancel the execution of a SwingWorker?
                            
                                Eclipse's workspace: Shall I put my images in 'src' or 'bin' folder?
                            
                                abstract compareTo method not being overrided
                            
                                Make a simple timer in Java
                            
                                Best way to constrain user to enter a time in a JTextField
                            
                                Overriding Constructors
                            
                                Show prompt before closing JFrame [duplicate]
                            
                                Scala: load Java properties [closed]
                            
                                Clear edittext focus and hide keyboard when button is pressed
                            
                                Akka: How to find the current nodes in a cluster?
                            
                                How to replaceAll special characters in a string? [duplicate]
                            
                                mvn -version error: We cannot execute /usr/libexec/java_home/bin/java
                            
                                How to change the font of tabLayout with a custom font / calligraphy
                            
                                Convert int[] to comma-separated string
                            
                                Finding Memory Usage in Java
                            
                                How do I best catch up with the latest developments in java? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With