Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Initialize a Vec with not-None values only

Tags:

rust

If I have variables like this:

let a: u32 = ...;
let b: Option<u32> = ...;
let c: u32 = ...;

, what is the shortest way to make a vector of those values, so that b is only included if it's Some?

In other words, is there something simpler than this:

let v = match b {
    None => vec![a, c],
    Some(x) => vec![a, x, c],
};

P.S. I would prefer a solution where we don't need to use the variables more than once. Consider this example:

let some_person: String = ...;
let best_man: Option<String> = ...;
let a_third_person: &str = ...; 
let another_opt: Option<String> = ...;
...

As can be seen, we might have to use longer variable names, more than one Option (None), expressions (like a_third_person.to_string()), etc.

like image 397
at54321 Avatar asked Dec 30 '21 10:12

at54321


People also ask

How do you initialize VEC in Rust?

In Rust, there are several ways to initialize a vector. In order to initialize a vector via the new() method call, we use the double colon operator: let mut vec = Vec::new();

What is VEC in Rust?

Vector is a module in Rust that provides the container space to store values. It is a contiguous resizable array type, with heap-allocated contents. It is denoted by Vec<T>. Vectors in Rust have O(1) indexing and push and pop operations in vector also take O(1) complexity.

How do you clear a vector in Rust?

To remove all elements from a vector in Rust, use . retain() method to keep all elements the do not match. let mut v = vec![


Video Answer


4 Answers

After some thinking and investigating, I've come with the following crazy thing.


The end goal is to have a macro, optional_vec![], that you can pass it either T or Option<T> and it should behave like described in the question. However, I decided on a strong restriction: it should have the best performance possible. So, you write:

optional_vec![a, b, c]

And get at least the performance of hand-written match, if not more. This forbids the use of the simple [Some(a), b, Some(c)].into_iter().flatten().collect::<Vec<_>>(), suggested in my other answer (though even this solution needs some way to differentiate between Option<T> and just T, which, like we'll see, is not an easy problem at all).

I will first warn that I've not found a way to make my macro work with Option. That is, if you want to build a vector of Option<T> from Option<T> and Option<Option<T>>, it will not work.


When a design a complex macro, I like to think first how the expanded code will look like. And in this macro, we have several hard problems to solve.

First, the macro take plain expressions. But somehow, it needs to switch on their type being T or Option<T>. How should such thing be done?

The feature we use to do such things is specialization.

#![feature(specialization)]

pub trait Optional {
    fn some_method(self);
}

impl<T> Optional for T {
    default fn some_method(self) {
        // Just T
    }
}

impl<T> Optional for Option<T> {
    fn some_method(self) {
        // Option<T>
    }
}

Like you probably noticed, now we have two problems: first, specialization is unstable, and I'd like to stay with stable. Second, what should be inside the trait? The second problem is easier to solve, so let's begin with it.

Turns out that the most performant way to do the pushing to the vector is to pre-allocate capacity (Vec::with_capacity), write to the vector by using pointers (don't push(), it optimizes badly!) then set the length (Vec::set_len()).

We can get a pointer to the internal buffer of the vector using Vec::as_mut_ptr(), and advance the pointer via <*mut T>::add(1).

So, we need two methods: one to hint us about the capacity (can be zero for None or one for Some() and non-Option elements), and a write_and_advance() method:

pub trait Optional {
    type Item;
    fn len(&self) -> usize;
    unsafe fn write_and_advance(self, place: &mut *mut Self::Item);
}

impl<T> Optional for T {
    default type Item = Self;
    default fn len(&self) -> usize { 1 }
    default unsafe fn write_and_advance(self, place: &mut *mut Self) {
        place.write(self);
        *place = place.add(1);
    }
}

impl<T> Optional<T> for Option<T> {
    type Item = T;
    fn len(&self) -> usize { self.is_some() as usize }
    unsafe fn write_and_advance(self, place: &mut *mut T) {
        if let Some(value) = self {
            place.write(value);
            *place = place.add(1);
        }
    }
}

It doesn't even compile! For the why, see Mismatch between associated type and type parameter only when impl is marked `default`. Luckily for us, the trick we'll use to workaround specialization not being stable does work in this situation. But for now, let's assume it works. How will the code using this trait look like?

match (a, b, c) { // The match is here because it's the best binding for liftimes: see https://stackoverflow.com/a/54855986/7884305
    (a, b, c) => {
        let len = Optional::len(&a) + Optional::len(&b) + Optional::len(&c);
        let mut result = ::std::vec::Vec::with_capacity(len);
        
        let mut next_element = result.as_mut_ptr();
        unsafe {
            Optional::write_and_advance(a, &mut next_element);
            Optional::write_and_advance(b, &mut next_element);
            Optional::write_and_advance(c, &mut next_element);
            
            result.set_len(len);
        }
        
        result
    }
}

And it works! Except that it does not, because the specialization does not compile as I said, and we also want to not repeat all of this boilerplate but insert it into a macro.

So, how do we solve the problems with specialization: being unstable and not working?

dtonlay has a very cool trick he calls autoref specialization (BTW, all of this repo is a very recommended reading!). This is a trick that can be used to emulate specialization. It works only in macros, but we're in a macro so this is fine.

I will not elaborate about the trick here (I recommend to read his post; he also used this trick in the excellent and very widely used anyhow crate). In short, the idea is to trick the typechecker by implementing a trait for T under certain conditions (the specialized impl) and other trait for &T for the general case (this could be inherent impl if not coherence). Since Rust performs automatic referencing during method resolution, that is take reference to the receiver as needed, this will work - the typechecker will autoref if needed, and will stop in the first applicable impl - i.e. the specialized impl if it matches, or the general impl otherwise.

Here's an example:

use std::fmt;

pub trait Display {
    fn foo(&self);
}
// Level 1
impl<T: fmt::Display> Display for T {
    fn foo(&self) { println!("Display({}), {}", std::any::type_name::<T>(), self); }
}

pub trait Debug {
    fn foo(&self);
}
// Level 2
impl<T: fmt::Debug> Debug for &T {
    fn foo(&self) { println!("Debug({}), {:?}", std::any::type_name::<T>(), self); }
}

macro_rules! foo {
    ($e:expr) => ((&$e).foo());
}

Playground.

We can use this trick in our case:

#[doc(hidden)]
pub mod autoref_specialization {
    #[derive(Copy, Clone)]
    pub struct OptionTag;
    pub trait OptionKind {
        fn optional_kind(&self) -> OptionTag; 
    }
    impl<T> OptionKind for Option<T> {
        #[inline(always)]
        fn optional_kind(&self) -> OptionTag { OptionTag }
    }
    impl OptionTag {
        #[inline(always)]
        pub fn len<T>(self, this: &Option<T>) -> usize { this.is_some() as usize }
        #[inline(always)]
        pub unsafe fn write_and_advance<T>(self, this: Option<T>, place: &mut *mut T) {
            if let Some(value) = this {
                place.write(value);
                *place = place.add(1);
            }
        }
    }
    
    #[derive(Copy, Clone)]
    pub struct DefaultTag;
    pub trait DefaultKind {
        fn optional_kind(&self) -> DefaultTag; 
    }
    impl<T> DefaultKind for &'_ T {
        #[inline(always)]
        fn optional_kind(&self) -> DefaultTag { DefaultTag }
    }
    impl DefaultTag {
        #[inline(always)]
        pub fn len<T>(self, _this: &T) -> usize { 1 }
        #[inline(always)]
        pub unsafe fn write_and_advance<T>(self, this: T, place: &mut *mut T) {
            place.write(this);
            *place = place.add(1);
        }
    }
}

And the expanded code will look like:

use autoref_specialization::{DefaultKind as _, OptionKind as _};
match (a, b, c) {
    (a, b, c) => {
        let (a_tag, b_tag, c_tag) = (
            (&a).optional_kind(),
            (&b).optional_kind(),
            (&c).optional_kind(),
        );
        
        let len = a_tag.len(&a) + b_tag.len(&b) + c_tag.len(&c);
        let mut result = ::std::vec::Vec::with_capacity(len);
        
        let mut next_element = result.as_mut_ptr();
        unsafe {
            a_tag.write_and_advance(a, &mut next_element);
            b_tag.write_and_advance(b, &mut next_element);
            c_tag.write_and_advance(c, &mut next_element);
            
            result.set_len(len);
        }
        
        result
    }
}

It may be tempting to try to convert this immediately into a macro, but we still have one unsolved problem: our macro need to generate identifiers. This may not be obvious, but what if we pass optional_vec![1, Some(2), 3]? We need to generate the bindings for the match (in our case, (a, b, c) => ...) and the tag names ((a_tag, b_tag, c_tag)).

Unfortunately, generating names is not something macro_rules! can do in today's Rust. Fortunately, there is an excellent crate paste (another one from dtonlay!) that is a small proc-macro that allows you to do that. It is even available on the playground!

However, we need a series of identifiers. That can be done with tt-munching, by repeatedly adding some letter (I used a), so you get a, aa, aaa, ... you get the idea.

#[doc(hidden)]
pub mod reexports {
    pub use std::vec::Vec;
    
    pub use paste::paste;
}

#[macro_export]
macro_rules! optional_vec {
    // Empty case
    { @generate_idents
        exprs = []
        processed_exprs = [$($e:expr,)*]
        match_bindings = [$($binding:ident)*]
        tags = [$($tag:ident)*]
    } => {{
        use $crate::autoref_specialization::{DefaultKind as _, OptionKind as _};
        match ($($e,)*) {
            ($($binding,)*) => {
                let ($($tag,)*) = (
                    $((&$binding).optional_kind(),)*
                );
                
                let len = 0 $(+ $tag.len(&$binding))*;
                let mut result = $crate::reexports::Vec::with_capacity(len);
                
                let mut next_element = result.as_mut_ptr();
                unsafe {
                    $($tag.write_and_advance($binding, &mut next_element);)*
                    
                    result.set_len(len);
                }
                
                result
            }
        }
    }};
    
    { @generate_idents
        exprs = [$e:expr, $($rest:expr,)*]
        processed_exprs = [$($processed_exprs:tt)*]
        match_bindings = [$first_binding:ident $($bindings:ident)*]
        tags = [$($tags:ident)*]
    } => {
        $crate::reexports::paste! {
            $crate::optional_vec! { @generate_idents
                exprs = [$($rest,)*]
                processed_exprs = [$($processed_exprs)* $e,]
                match_bindings = [
                    [< $first_binding a >]
                    $first_binding
                    $($bindings)*
                ]
                tags = [
                    [< $first_binding a_tag >]
                    $($tags)*
                ]
            }
        }
    };
    
    // Entry
    [$e:expr $(, $exprs:expr)* $(,)?] => {
        $crate::optional_vec! { @generate_idents
            exprs = [$($exprs,)+]
            processed_exprs = [$e,]
            match_bindings = [__optional_vec_a]
            tags = [__optional_vec_a_tag]
        }
    };
}

Playground.

like image 196
Chayim Friedman Avatar answered Oct 23 '22 20:10

Chayim Friedman


If it depends on just one variable:

b.map(|b| vec![a, b, c]).unwrap_or_else(|| vec![a, c]);

Playground

like image 21
Netwave Avatar answered Oct 23 '22 18:10

Netwave


I can also personally recommend

let mut v = vec![a, c];
v.extend(b);

Short and clear.

like image 2
DreamConspiracy Avatar answered Oct 23 '22 18:10

DreamConspiracy


Sometime the straight forward solution is the best:

fn jim_power(a: u32, b: Option<u32>, c: u32) -> Vec<u32> {
   let mut acc = Vec::with_capacity(3);
   acc.push(a);
   if let Some(b) = b {
       acc.push(b);
   }
   acc.push(c);
   acc
}

fn ys_iii(
    some_person: String,
    best_man: Option<String>,
    a_third_person: String,
    another_opt: Option<String>,
) -> Vec<String> {
    let mut acc = Vec::with_capacity(4);
    acc.push(some_person);
    best_man.map(|x| acc.push(x));
    acc.push(a_third_person);
    another_opt.map(|x| acc.push(x));
    acc
}
like image 1
Stargateur Avatar answered Oct 23 '22 19:10

Stargateur