Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory efficient conversion between a HashMap and a Vec

I'm trying to convert a big HashMap<K, V> to Vec<(K, V)>. The usual way of doing it looks like this:

// initialize HashMap
let cap = 50000000;
let mut hm: HashMap<usize, usize> = HashMap::new();
for i in 0..cap {
    hm.insert(i, i);
}
// convert HashMap to Vec
let vec = hm.into_iter().collect::<Vec<(usize, usize)>>();

This code does not work well if HashMap is big enough - in the beginning of the call to collect(), the original HashMap will still be in memory and Vec will be allocated with the capacity of lower size hint taken from the Iterator. This causes an out of memory panic for really big HashMaps, even though I should be able to convert between these two types with very little additional memory overhead. So far I have came up with the following solution:

// create small vector
let mut vec: Vec<(usize, usize)> = Vec::with_capacity(100);
for i in hm.into_iter() {
    vec.push(i);
    // reserve few megabytes
    if vec.capacity() - vec.len() < 10 {
        vec.reserve_exact(1000000);
    }
}

Is there any better (more efficient or more idiomatic) approach to this problem? I am willing to use unsafe code if it were to improve performance.

Edit As pointed out into_iter does not deallocate during iteration, so the proposed solution does not work as intended. Is there any other way of converting these collections apart from dumping HashMap to the file and then reading that file into Vec?

like image 865
Fuine Avatar asked Jun 09 '26 02:06

Fuine


2 Answers

Allocating the exact amount needed up front is the memory- and time-efficient solution.

Say you want to create a vector with 100 items. If you were to allocate space for 50 items, when you go to add item 51, two possibilities exist:

  1. The allocation can be extended in place and you continue on your merry way.
  2. The allocation can not be extended in place, so a new, larger allocation is made. All the data needs to be copied from the previous allocation; likely an O(n) operation. During this copy, both allocations are live, taking up 50 + 100 slots, more space than if the original allocation was properly sized.

It's not possible to know which case will happen, so you have to assume the worst.

This is one of the reasons that Iterator has the size_hint method: knowing how many items to allocate for is more efficient.

On the flip side, the HashMap likely stores the data in one big allocation as it's more efficient. This means that it's not possible (or maybe not easy / effective) to move one item out and then decrease the allocation. Even if you could do this, at the beginning of the copy you'd have both the entire HashMap and Vec allocated.

There are two possibilities I can think of that could improve the situation:

  1. If HashMap stores the data internally in a Vec, then potentially a method could be added to HashMap that returns that Vec after some last-minute sanitization.
  2. Avoid storing the HashMap and/or Vec at all. For example, if you need to iterate over the data, you don't need to collect to a Vec first; just iterate over it.
like image 185
Shepmaster Avatar answered Jun 10 '26 15:06

Shepmaster


It seems that you're not satisfied with Vec's implementation of FromIterator trait. I don't know if it is reasonable to change it in std. However, you can introduce a wrapper for Vec and implement FromIterator as you wish:

#[derive(Debug)]
struct OptimizedVec<T>(Vec<T>);

impl<T> std::iter::FromIterator<T> for OptimizedVec<T> {
    #[inline]
    fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> OptimizedVec<T> {
        let mut vec = Vec::with_capacity(100);
        for i in iter {
            vec.push(i);
            // reserve few megabytes
            if vec.capacity() - vec.len() < 10 {
                vec.reserve_exact(1000000);
            }
        }
        OptimizedVec(vec)
    }
}

//...
let vec: OptimizedVec<_> = hm.into_iter().collect();

The Vec value will be accessible as vec.0.

like image 38
Pavel Strakhov Avatar answered Jun 10 '26 14:06

Pavel Strakhov