Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the purpose of SizeHint in Iterator::unzip?

Tags:

rust

From the Rust standard library implementation of unzip:

fn unzip<A, B, FromA, FromB>(self) -> (FromA, FromB) where
    FromA: Default + Extend<A>,
    FromB: Default + Extend<B>,
    Self: Sized + Iterator<Item=(A, B)>,
{
    struct SizeHint<A>(usize, Option<usize>, marker::PhantomData<A>);
    impl<A> Iterator for SizeHint<A> {
        type Item = A;

        fn next(&mut self) -> Option<A> { None }
        fn size_hint(&self) -> (usize, Option<usize>) {
            (self.0, self.1)
        }
    }

    let (lo, hi) = self.size_hint();
    let mut ts: FromA = Default::default();
    let mut us: FromB = Default::default();

    ts.extend(SizeHint(lo, hi, marker::PhantomData));
    us.extend(SizeHint(lo, hi, marker::PhantomData));

    for (t, u) in self {
        ts.extend(Some(t));
        us.extend(Some(u));
    }

    (ts, us)
}

These two lines:

ts.extend(SizeHint(lo, hi, marker::PhantomData));
us.extend(SizeHint(lo, hi, marker::PhantomData));

don't actually extend ts or us by anything, since the next method of SizeHint returns None. What's the purpose of doing so?

like image 858
qed Avatar asked May 06 '16 14:05

qed


2 Answers

It is a cool trick. By giving this size hint, it gives ts and us the chance to reserve the space for the extend calls in the loop. According to the documentation

size_hint() is primarily intended to be used for optimizations such as reserving space for the elements of the iterator, but must not be trusted to e.g. omit bounds checks in unsafe code. An incorrect implementation of size_hint() should not lead to memory safety violations.

Note that the creation of SizeHint is necessary because the extend call in for loop is made with a Some value (Optional implements the Iterator trait), and the size_hint for a Some value is (1, Some(1)). That doesn't help with pre allocation.

But looking at the code for Vec, this will have no effect (neither in HashMap and VecDeque). Others Extend implementations may be different.

The execution of ts.extend(SizeHint(lo, hi, marker::PhantomData)); does not trigger a resize, since next returns None. Maybe some one should write a patch.

impl<T> Vec<T> {
    fn extend_desugared<I: Iterator<Item = T>>(&mut self, mut iterator: I) {
        // This function should be the moral equivalent of:
        //
        //      for item in iterator {
        //          self.push(item);
        //      }
        while let Some(element) = iterator.next() {
            let len = self.len();
            if len == self.capacity() {
                let (lower, _) = iterator.size_hint();
                self.reserve(lower.saturating_add(1));
            }
            unsafe {
                ptr::write(self.get_unchecked_mut(len), element);
                // NB can't overflow since we would have had to alloc the address space
                self.set_len(len + 1);
            }
        }
    }
}
like image 167
malbarbo Avatar answered Nov 10 '22 17:11

malbarbo


It's a dubious hack!

It implements an iterator with a fake (overestimated) size hint to encourage the produced collection to reserve the eventually appropriate capacity up front.

Cool trick but, it does so by implementing a size hint where the estimated lower bound is greater than the actual number of elements produced (0). If the lower bound is not known, the iterator should return a lower bound of 0. This implementation is arguably very buggy for this reason, and the collection's Extend impl may react with bugginess as a result (but of course not memory unsafety.)

like image 5
bluss Avatar answered Nov 10 '22 17:11

bluss