Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why big array java is slow

I created an array of a class with big length, ~150M elements, sorted by key (describe bellow). Then I build a simple http server to feedback each request as a binary search function on the array. (I'm sure the server's work is fine)

Data initiation is just fine (ofcouse quite slow). Binary search function is fast as expected.

The problem is: the respond is just fast for a while (10mins, 1 hours... many kind of time range), then server take quite long (some minutes) to do binary search function for a request, then it's fast back, then slow after a while... While it's slow, I check server status (htop), it seems like jvm is in GC.

The problem is not happened when I split big array into smaller ones, for ex: 10 arrays of 15M elements, I find the target array before do searching on. So I guess something happens in JVM when I create too large array

(Edit: I have no problem on spliting big array into pieces because I implemented the "SiteInfo" object to native, big amount of objects in JVM is decreased. So the problem caused by too many objects I created as replies bellow, thank you guys)

Guys do you have any idea for my problem?

(I posted my code, there's some Pseudocode that I think it's not really important)

public static class Token2TopSite implements Comparable<Token2TopSite> {

    public final String token; // this is key for binary search
    public final SiteInfo[] topSites; // just data, not important at this question, I think

    public Token2TopSite(String token, SiteInfo[] topSites) {
        this.token = token;
        this.topSites = topSites;
    }

    @Override
    public int compareTo(Token2TopSite o) {
        return token.compareTo(o.token);
    }

    public static void main(String[] args) {
        Token2TopSite[] array = new Token2TopSite[150 * 1000000];
        ...; // init data for array, this runs properly
        Arrays.sort(array);
        startServerOnArray(array); // each request is a element search on the array
    }
} 
like image 809
linkinsteps Avatar asked Aug 01 '14 10:08

linkinsteps


People also ask

Is ArrayList slow Java?

ArrayList s are more than twice as slow as the corresponding array.

Is ArrayList slower than array Java?

An Array is a collection of similar items. Whereas ArrayList can hold item of different types. An array is faster and that is because ArrayList uses a fixed amount of array.

How do you handle a large array in Java?

Java arrays are indexed by int, so an array can't get larger than 2^31 (there are no unsigned ints). So, the maximum size of an array is 2147483648, which consumes (for a plain int[]) 8589934592 bytes (= 8GB). Thus, the int-index is usually not a limitation, since you would run out of memory anyway.

Is ArrayList faster than list Java?

Array is faster and that is because ArrayList uses a fixed amount of array. However when you add an element to the ArrayList and it overflows. It creates a new Array and copies every element from the old one to the new one.


2 Answers

I think Omry Yadan's diagnosis is probably correct. These sound like GC pauses, and they are likely to be particularly bad if you have huge numbers of long-lived reachable objects in the heap. The GC has to traverse all of the live objects when doing a "full" collection.

(You can confirm that this is really a GC-related problem by enabling GC logging, and comparing the times when your server performance is slow with the GC events.)

However, I disagree with his suggested solution.

Rather than rewriting the application, a simpler solution is to configure the JVM to use a "concurrent" or "low pause" garbage collector. It is just a matter of setting some parameters on the command that that starts your webserver's JVM.

Here are some Oracle references:

  • Garbage Collection Basics.
  • HotSpot VM Options
  • Java HotSpot Garbage Collection
like image 115
Stephen C Avatar answered Oct 22 '22 11:10

Stephen C


The JVM does not work well when there are more than 10s of millions of objects. the reason is that the garbage collector, when looking for free memory - needs to traverse all your objects if it does not find anything to free. one solution is to use less objects. I wrote a primitive collections library called Banana that does it's own memory management. it basically creates a single (or several) int array (int[]) and let you build dynamic data structures on top of that (several are already implemented, HashTable, LinkedList, LRUCache and more).

like image 37
Omry Yadan Avatar answered Oct 22 '22 11:10

Omry Yadan