Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a vector be moved and modified without an extra allocation?

Consider the following code:

let u: Vec<u8> = (64..74).collect();
let v: Vec<u8> = u.iter().map(|i| i + 1).collect();

u was not moved, therefore v was inevitably newly allocated.

But if I do the following:

let w: Vec<u8> = u.into_iter().map(|i| i + 1).collect();

u was moved and w is the name of its transformation. Here is some pseudo-code representing what I mean:

mark u as "moved"
for i = 0..10:
    u[i] += 1
w = u

There is (in my opinion) no need for a new allocation, since we map a type to itself. This wouldn't be the case for this code:

let t: Vec<u8> = (64..74).collect();
let s: String = t.into_iter().map(|i| i as char).collect();

To summarize my question

Is there an allocation of a new Vec when we convert a Vec into an iterator and then map this iterator to an iterator on elements of the same type and then collect the result into a Vec?

And if there is indeed an allocation, why?

I tried to --emit=mir, but I wasn't able to find the answer. I'm using rustc 1.20 nightly (if that matters).

If you want to play with code: Try it online!

like image 582
jferard Avatar asked Aug 18 '17 10:08

jferard


People also ask

What happens when you add three vectors in a given order?

Interestingly enough, the order in which three vectors are added has no effect upon either the magnitude or the direction of the resultant. The resultant will still have the same magnitude and direction. For example, consider the addition of the same three vectors in a different order.

What is the addition of vectors?

One such operation is the addition of vectors. Two vectors can be added together to determine the result (or resultant). This process of adding two or more vectors has already been discussed in an earlier unit.

Can two vectors be added together to determine the result?

Two vectors can be added together to determine the result (or resultant). This process of adding two or more vectors has already been discussed in an earlier unit.

How do you exchange the contents of a vector?

If there is ever a need to exchange the contents of vectors, there's another simple function in the form of std::vector::swap. It takes a vector as an argument which is passed by reference, and the vectors have their contents exchanged. The vector passed in shouldn't, therefore, be const.


2 Answers

Let's see the source of the implementation of into_iter() for Vec<T>:

fn into_iter(mut self) -> IntoIter<T> {
    unsafe {
        let begin = self.as_mut_ptr();
        assume(!begin.is_null());
        let end = if mem::size_of::<T>() == 0 {
            arith_offset(begin as *const i8, self.len() as isize) as *const T
        } else {
            begin.offset(self.len() as isize) as *const T
        };
        let cap = self.buf.cap();
        mem::forget(self);
        IntoIter {
            buf: Shared::new(begin),
            cap: cap,
            ptr: begin,
            end: end,
        }
    }
}

Creating the IntoIter iterator incurs several extra allocations, but not for the elements of the vector; instead, the vector's underlying memory details are registered. How about the code behind map()?

fn map<B, F>(self, f: F) -> Map<Self, F> where
    Self: Sized, F: FnMut(Self::Item) -> B,
{
    Map{iter: self, f: f}
}

No extra vectors allocated here either. The last piece of the puzzle is collect():

fn collect<B: FromIterator<Self::Item>>(self) -> B where Self: Sized {
    FromIterator::from_iter(self)
}

No answers here; what about the implementation of from_iter() for Vec<T>?

impl<T> FromIterator<T> for Vec<T> {
    #[inline]
    fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> Vec<T> {
        <Self as SpecExtend<T, I::IntoIter>>::from_iter(iter.into_iter())
    }
}

This is beginning to look like magic, but perhaps the related SpecExtend code will reveal what we're looking for:

impl<T, I> SpecExtend<T, I> for Vec<T>
    where I: Iterator<Item=T>,
{
    default fn from_iter(mut iterator: I) -> Self {
        // Unroll the first iteration, as the vector is going to be
        // expanded on this iteration in every case when the iterable is not
        // empty, but the loop in extend_desugared() is not going to see the
        // vector being full in the few subsequent loop iterations.
        // So we get better branch prediction.
        let mut vector = match iterator.next() {
            None => return Vec::new(),
            Some(element) => {
                let (lower, _) = iterator.size_hint();
                let mut vector = Vec::with_capacity(lower.saturating_add(1));
                unsafe {
                    ptr::write(vector.get_unchecked_mut(0), element);
                    vector.set_len(1);
                }
                vector
            }
        };
        <Vec<T> as SpecExtend<T, I>>::spec_extend(&mut vector, iterator);
        vector
    }

    default fn spec_extend(&mut self, iter: I) {
        self.extend_desugared(iter)
    }
}

In this code we can finally see the Vec::new and Vec::with_capacity methods called which allocate fresh space for the resulting vector.

TL;DR: no, it is not possible to move and modify a vector without an extra allocation.

like image 163
ljedrz Avatar answered Sep 22 '22 03:09

ljedrz


Is there an allocation of a new Vec when we convert a Vec into an iterator and then map this iterator to an iterator on elements of the same type and then collect the result into a Vec?

Yes, the necessary optimization to avoid that allocation is simply too high-level to perform for any of the compiler tiers since they would have to reason about heap allocations and the pointer magic going on in the unsafe code blocks that underpin the Vec implementation. Doubly so in the general case where drop-handlers, panics inside the map and similar things come into play.

It might be possible to optimize at the library level by specializing
collect() -> Vec<T> for specific types such as std::iter::Map<std::vec::IntoIter<T>, _>, but those optimizations would have to be applied on a case-by-case basis where it is known to be safe.

if I understand correctly, your code is approximately the Rust translation of my pseudo-code, without the original "functional style".

For the functional style there is a plan to add for_each to iterators, so if you take an iter_mut something equivalent could be achieved with far fewer heroics on the library or compiler side.

And it would actually reflect the intent of doing something in place more directly instead of just doing some optimization magically which then could surprisingly break under slight modification.

like image 38
the8472 Avatar answered Sep 18 '22 03:09

the8472