Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I take an item from a Vec in Rust?

I'm looking for a method that consumes a Vec and returns one element, without the overhead of restoring Vec's invariants the way remove and swap_remove do:

fn take<T>(vec: Vec<T>, index: usize) -> Option<T>

However, I can't find such a method. Am I missing something? Is this actually unsafe or impossible?

This is a different question from Built in *safe* way to move out of Vec<T>? There the goal was a remove method that didn't panic on out of bounds access and returned a Result. I'm looking for a method that consumes a Vec and returns one of the elements. None of the answers to the above question address my question.

like image 595
Others Avatar asked Aug 21 '17 19:08

Others


2 Answers

You can write your function like this:

fn take<T>(mut vec: Vec<T>, index: usize) -> Option<T> {
    if vec.get(index).is_none() {
        None
    } else {
        Some(vec.swap_remove(index))
    }
}

The code you see here (get and swap_remove) is guaranteed O(1).

However, kind of hidden, vec is dropped at the end of the function and this drop operation is likely not O(1), but O(n) (where n is vec.len()). If T implements Drop, then drop() is called for every element still inside the vector, meaning dropping the vector is guaranteed O(n). If T does not implement Drop, then the Vec only needs to deallocate the memory. The time complexity of the dealloc operation depends on the allocator and is not specified, so we cannot assume it is O(1).


To mention another solution using iterators:

fn take<T>(vec: Vec<T>, index: usize) -> Option<T> {
    vec.into_iter().nth(index)
}

I was about to write this:

While Iterator::nth() usually is a linear time operation, the iterator over a vector overrides this method to make it a O(1) operation.

But then I noticed, that this is only true for the iterator which iterates over slices. The std::vec::IntoIter iterator which would be used in the code above, doesn't override nth(). It has been attempted here, but it doesn't seem to be that easy.

So, as of right now, the iterator solution above is a O(n) operation! Not to mention the time needed to drop the vector, as explained above.

like image 142
Lukas Kalbertodt Avatar answered Oct 02 '22 08:10

Lukas Kalbertodt


The reason fn take<T>(vec: Vec<T>, index: usize) -> Option<T> does not exist in the standard library is that it is not very useful in general. For example, supposing that you have a Vec<String> of length 10, it means throwing away 9 strings and only using 1. This seems wasteful.

In general, the standard library will try to provide an API that is useful in a maximum of scenarios, and in this instance it would be more logical to have a fn take<T>(vec: &mut Vec<T>, index: usize) -> Option<T>.

The only question is how to preserve the invariant, of course:

  • it can be preserved by exchanging with the last element, which is what Vec::swap_remove does,
  • it can be preserved by shifting the successor elements in, which is what Vec::drain does.

Those are very flexible, and can be adapted to fill more specific scenarios, such as yours.


Adapting swap_remove:

fn take<T>(mut vec: Vec<T>, index: usize) -> Option<T> {
    if index < vec.len() {
        Some(vec.swap_remove(index))
    } else {
        None
    }
}

Adapting drain:

fn take<T>(mut vec: Vec<T>, index: usize) -> Option<T> {
    if index < vec.len() {
        vec.drain(index..index+1).next()
    } else {
        None
    }
}

Noting that the former is more efficient: it's O(1).


I'm looking for a method that consumes the Vec and returns one element, without the overhead of restoring Vec's invariants the way remove and swap_remove do.

This reeks of premature micro-optimization to me.

First of all, note that it is necessary to destroy the elements of the vector; you can accomplish this in two ways:

  1. swap_remove, then iterate over each element to destroy them,
  2. Iterate over each element to destroy them, skipping the specific index.

It is not clear to me that the latter would be faster than the former; if anything it looks more complicated, with more branches (I advise two loops), which may throw off the predictor and may be less amenable to vectorization.

Secondly, before complaining about the overhead of restoring the Vec's invariant, have you properly profiled the solution?

If we look at the swap_remove variant, there are 3 steps:

  1. swap_remove (O(1)),
  2. destroy each remaining element (O(N)),
  3. free the backing memory.

Step 2 may be optimized out if the element has no Drop implementation, but otherwise I would be it's a toss whether (2) or (3) is dominating the cost.

TL;DR: I am afraid that you are fighting ghost issues, profile before trying to optimize.

like image 21
Matthieu M. Avatar answered Oct 02 '22 08:10

Matthieu M.