Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is it that, the more '1' bits in my Key, the longer it takes to place in the HashMap?

I'm doing a project for a class which focuses on storing a huge matrix with mostly 0 values in memory and performing some matrix math on it. My first thought was to use a HashMap to store the matrix elements, and only store the elements which are non-zero, in order to avoid using huge quantities of memory.

I wanted to make a key for the HashMap which would represent both the row and column number of the element in a way that, when I accessed that entry in the map, I could re-extract both values. I don't know Java as well as C#- in C# I would make a struct with Row and Column members, but in Java I quickly realized there are no User Value Types. With a deadline looming I went with a safe bet and made the Key a long. I stored the row data (32-bit int) in the first 32 bits and the column data in the last 32 using some very simple bit shifting. [EDIT: I'd also like to note that my HashMap is initialized with a specific initial size which exactly represents the number of values I store in it, which is never exceeded.]

[Side note: the reason I want to be able to extract the row/column data again is to greatly increase the efficiency of matrix multiplication, from O(n^2) to O(n), and a smaller n to boot]

What I noticed after implementing this structure is that it takes a whopping 7 seconds to read a 23426 x 23426 matrix from a text file in which only non-zero elements are given, but it only takes 2 seconds to calculate the eigen values we are required to give! After selective commenting-out of methods, I have concluded that the bulk of this 7 second timespan is spent storing my values in the HashMap.

public void Set(double value, int row, int column) {
    //assemble the long key, placing row and column in adjacent sets of bits
    long key = (long)row << SIZE_BIT_MAX; //(SIZE_BIT_MAX is 32)
    key += column;
    elements.put(key, value);
}

That is the code for setting a value. If I use this method instead:

public void Set(double value, int row, int column) {
    //create a distinct but smaller key (around 32 bits max)
    long key = (long)(row * matrixSize) + column;
    elements.put(key, value);
}

The reading only takes 2 seconds. Both of these versions of the key are distinct for every element, both are long type, and the actual code to create either of them is minimal in complexity. It's the elements.put(key, value) which makes the difference between 7 seconds and 2.

My question is, why? The difference I see between these key versions is that the first one has bits set to 1 throughout and more frequently, while the second has all of its highest 32 bits set to 0. Am I chasing a red herring, or is this fairly dramatic difference in performance the result of something internal in the HashMap.put method?

like image 448
A-Type Avatar asked Feb 16 '12 05:02

A-Type


People also ask

Why is HashMap faster?

The reason that HashMap is faster than HashSet is that the HashMap uses the unique keys to access the values. It stores each value with a corresponding key and we can retrieve these values faster using keys during iteration. While HashSet is completely based on objects and therefore retrieval of values is slower.

How does the object's hashCode quality influence the HashMap performance?

since HashMap uses hashCode to calculate which bucket to use in the hashtable if you return 1 from hashCode you effectively make your HashMap 's performance be like an (unsorted) LinkedList 's performance. Returning random values will simply blow your HashMap up since equal objects will no longer have equal hashCode s.

How the HashMap works?

HashMap uses multiple buckets and each bucket points to a Singly Linked List where the entries (nodes) are stored. Once the bucket is identified by the hash function using hashcode, then hashCode is used to check if there is already a key with the same hashCode or not in the bucket(singly linked list).

Is a HashMap efficient?

HashMap, being a hashtable-based implementation, internally uses an array-based data structure to organize its elements according to the hash function. HashMap provides expected constant-time performance O(1) for most operations like add(), remove() and contains(). Therefore, it's significantly faster than a TreeMap.


1 Answers

Take a look at how Long implements the hashCode() method (at least in OpenJDK 7):

public int hashCode() {
    return (int)(value ^ (value >>> 32));
}

This means that your key gets stuffed back into 32 bits; all the lower bits are cancelling each other out quite often, resulting in a lot of collisions which requires the HashMap to spend extra time looking for a free slot in a bucket. Your second method avoids that problem because every key’s generated hash code is a unique value (because you only have 23426 x 23426 = 548777476 items which fits well into 32 bits).

So, the resaon is your key selection but not the number of set bits.

However, what exactly do you mean with “user value types?”

public class MatrixKey {
    private final int row;
    private final int column;
    public MatrixKey(int row, int column) {
        this.row = row;
        this.column = column;
    }
    public int getRow() { return row; }
    public int getColumn() { return column; }
}

This class can make a perfectly good key for a Map in Java once you implement hashCode() and equals(). Just make sure that you don’t implement its hashCode method the way Long does. :)

like image 197
Bombe Avatar answered Sep 22 '22 02:09

Bombe