Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to map a function over a Vec without allocating a new Vec?

Tags:

rust

I have the following:

enum SomeType {
    VariantA(String),
    VariantB(String, i32),
}

fn transform(x: SomeType) -> SomeType {
    // very complicated transformation, reusing parts of x in order to produce result:
    match x {
        SomeType::VariantA(s) => SomeType::VariantB(s, 0),
        SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
    }
}

fn main() {
    let mut data = vec![
        SomeType::VariantA("hello".to_string()),
        SomeType::VariantA("bye".to_string()),
        SomeType::VariantB("asdf".to_string(), 34),
    ];
}

I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.

As a work-around, I've added a Dummy variant to SomeType and then do the following:

for x in &mut data {
    let original = ::std::mem::replace(x, SomeType::Dummy);
    *x = transform(original);
}

This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?

like image 300
Michael Avatar asked Jan 24 '17 15:01

Michael


People also ask

What does VEC mean in Rust?

A contiguous growable array type, written as Vec<T> , short for 'vector'.

How do I make a new vector in Rust?

In order to initialize a vector via the new() method call, we use the double colon operator: let mut vec = Vec::new(); This call constructs a new, empty Vec<T> . The vector will not allocate until elements are pushed onto it.

What is a VEC u8?

Vec<u8> is like Box<[u8]> , except it additionally stores a "capacity" count, making it three machine words wide. Separately stored capacity allows for efficient resizing of the underlying array. It's the basis for String .


2 Answers

Your first problem is not map, it's transform.

transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?


The best fix, thus, is to change the signature of transform to:

fn transform(x: &mut SomeType) { ... }

then you can just do:

for x in &mut data { transform(x) }

Other solutions will be clunky, as they will need to deal with the fact that transform might panic.

like image 65
Matthieu M. Avatar answered Dec 26 '22 20:12

Matthieu M.


No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).

Even when the sizes are the same, it's non-trivial.

In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:

impl SomeType {
    fn transform(&mut self) {
        use SomeType::*;

        let old = std::mem::replace(self, VariantA(String::new()));

        // Note this line for the detailed explanation

        *self = match old {
            VariantA(s) => VariantB(s, 0),
            VariantB(s, i) => VariantB(s, 2 * i),
        };
    }
}
for x in &mut data {
    x.transform();
}

An alternate implementation that just replaces the String:

impl SomeType {
    fn transform(&mut self) {
        use SomeType::*;

        *self = match self {
            VariantA(s) => {
                let s = std::mem::replace(s, String::new());
                VariantB(s, 0)
            }
            VariantB(s, i) => {
                let s = std::mem::replace(s, String::new());
                VariantB(s, 2 * *i)
            }
        };
    }
}

In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .

See also:

  • Change enum variant while moving the field to the new variant

Why is it so complicated?

See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.

If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.

You even hinted at this (emphasis mine):

I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop

That "should" is the problem. During a panic, that dummy value is visible.

See also:

  • How can I swap in a new value for a field in a mutable reference to a structure?
  • Temporarily move out of borrowed content
  • How do I move out of a struct field that is an Option?

The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.

like image 41
Shepmaster Avatar answered Dec 26 '22 21:12

Shepmaster