Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster HashMap for sequential keys

Initially I was very surprised to find out Rust's HashMap, even with the FNV hasher, was considerably slower than the equivalents in Java, .NET, PHP. I am talking about optimized Release mode, not Debug mode. I did some calculations and realized the timings in Java/.NET/PHP were suspiciously low. Then it hit me - even though I was testing with a big hash table (millions of entries), I was reading mostly sequential key values (like 14, 15, 16, ...), which apparently resulted in lots of CPU cache hits, due to the way the standard hash tables (and hash-code functions for integers and short strings) in those languages are implementated, so that entries with nearby keys are usually located in nearby memory locations.

Rust's HashMap, on the other hand, uses the so called SwissTable implementation, which apparently distributes values differently. When I tested reading by random keys, everything fell into place - the "competitors" scored behind Rust.

So if we are in a situation where we need to perform lots of gets sequentially, for example iterating some DB IDs that are ordered and mostly sequential (with not too many gaps), is there a good Rust hash map implementation that can compete with Java's HashMap or .NET's Dictionary?


P.S. As requested in the comments, I paste an example here. I ran lots of tests, but here is a simple example that takes 75 ms in Rust (release mode) and 20 ms in Java:

In Rust:

let hm: FnvHashMap<i32, i32> = ...;  
// Start timer here
let mut sum: i64 = 0;
for i in 0..1_000_000 {
    if let Some(x) = hm.get(&i) {
        sum += *x as i64;
    }
}
println!("The sum is: {}", sum);

In Java:

Map<Integer, Integer> hm = ...;
// Start timer here
long sum = 0;
for (int i = 0; i < 1_000_000; i++) {
    sum += hm.get(i);
}

With HashMap<i32, i32> and its default SipHash hasher it took 190 ms. I know why it's slower than FnvHashMap. I'm just mentioning that for completeness.

like image 208
at54321 Avatar asked Dec 07 '25 02:12

at54321


1 Answers

First, here is some runnable code to measure the efficiency of the different implementations:

use std::{collections::HashMap, time::Instant};

fn main() {
    let hm: HashMap<i32, i32> = (0..1_000_000).map(|i| (i, i)).collect();
    let t0 = Instant::now();
    let mut sum = 0;
    for i in 0..1_000_000 {
        if let Some(x) = hm.get(&i) {
            sum += x;
        }
    }
    let elapsed = t0.elapsed().as_secs_f64();
    println!("{} - The sum is: {}", elapsed, sum);
}

On the old desktop machine I'm writing this on, it reports 76 ms to run. Since the machine is 10+ years old, I find it baffling that your hardware would take 190 ms to run the same code, so I'm wondering how and what you're actually measuring. But let's ignore that and concentrate on the relative numbers.

When you care about hashmap efficiency in Rust, and when the keys don't come from an untrusted source, the first thing to try should always be to switch to a non-DOS-resistant hash function. One possibility is the FNV hash function from the fnv crate, which you can get by switching HashMap to fnv::FnvHashMap. That brings performance to 34 ms, i.e. a 2.2x speedup.

If this is not enough, you can try the hash from the rustc-hash crate, which provides the hash function developed for the Rust compiler, originally adapted from one used by Firefox. Not based on any formal analysis, it underperforms on hash function test suites, but is reported to consistently outperform FNV. That's confirmed on the above example, where switching from FnvHashMap to rustc_hash::FxHashMap drops the time to 28 ms, i.e. a 2.7x speedup from the initial timing.

Finally, if you want to just imitate what C# and Java do, and could care less about certain patterns of inserted numbers leading to degraded performance, you can use the aptly-named nohash_hasher crate that gives you an identity hash. Changing HashMap<i32, i32> to HashMap<i32, i32, nohash_hasher::BuildNoHashHasher<i32>> drops the time to just under 4 ms, i.e. a whopping 19x speedup from the initial timing.

Since you report the Java example to be 9.5x faster than Rust, a 19x speedup should make your code approximately twice as fast as Java.

like image 186
user4815162342 Avatar answered Dec 08 '25 17:12

user4815162342



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!