Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I ensure that a Rust vector only contains alternating types?

I have a set of data that alternates between A and B. These are all valid choices:

  • A -> B -> A
  • A -> B -> A -> B
  • B -> A -> B
  • B -> A -> B -> A

I want to leverage the type system to make sure the alternating property is checked at compile time while maintaining good performance.

Solution 1: linked list

struct A {
    // data
    next: Option<B>,
}

struct B {
    // data
    next: Option<Box<A>>,
}

The problem is that the performance of this data structure will be poor at best. Linked lists have frequent cache misses, and for iterating the data structure this is quite bad.

Solution 2: Vec + enum

enum Types {
    A(DataA),
    B(DataB),
}

type Data = Vec<Types>;

With this solution, cache locality is much better, so yay for performance. However, this does not prevent putting 2 As side-by-side. There is also the fact that one needs to check the type at each iteration, while it is not needed because of the informal definition.

Solution 3: Combination

struct A {
    // data, default in first link = empty
    b: Option<B>,
}

struct B {
    // data
}

type Data = Vec<A>;

This combines the cache locality of the Vec with the type verification of the linked list. It is quite ugly, and one needs to check the first value to verify if it really is an A, or an empty container for the next B.

The question

Is there a data structure that allows compile-time type verification, while maintaining cache locality and avoiding extra allocation?

like image 262
AdminXVII Avatar asked Aug 06 '19 14:08

AdminXVII


2 Answers

To store alternating types in a way that the type system enforces and has reasonable efficiency, you can use a tuple: Vec<(X, Y)>.

Your situation also requires

  • Storing an extra leading value in an Option to handle starting with Y
  • Storing an extra trailing value in an Option to handle ending with X
use either::Either; // 1.5.2
use std::iter;

#[derive(Debug, Default)]
struct Data<X, Y> {
    head: Option<Y>,
    pairs: Vec<(X, Y)>,
    tail: Option<X>,
}

impl<X, Y> Data<X, Y> {
    fn iter(&self) -> impl Iterator<Item = Either<&X, &Y>> {
        let head = self.head.iter().map(Either::Right);

        let pairs = self.pairs.iter().flat_map(|(a, b)| {
            let a = iter::once(Either::Left(a));
            let b = iter::once(Either::Right(b));
            a.chain(b)
        });

        let tail = self.tail.iter().map(Either::Left);

        head.chain(pairs).chain(tail)
    }
}

That being said, you are going to have ergonomic issues somewhere. For example, you can't just push an Either<X, Y> because the previously pushed value might be of the same type. Creating the entire structure at once might be the simplest direction:

#[derive(Debug)]
struct A;
#[derive(Debug)]
struct B;

fn main() {
    let data = Data {
        head: Some(B),
        pairs: vec![(A, B)],
        tail: None,
    };

    println!("{:?}", data.iter().collect::<Vec<_>>());
}
like image 157
Shepmaster Avatar answered Oct 29 '22 14:10

Shepmaster


Is there a data structure that allows compile-time type verification, while maintaining cache locality and avoiding extra allocation?

You can use Rust's type system to enforce that items of each type are added in alternating order. The general strategy is to capture the type of the first item and also the previous item in the type of the whole structure and make different methods available according to the "current" type. When the previous item was an X, only the methods for adding a Y will be available, and vice versa.

I'm using two Vecs rather than a Vec of tuples. Depending on your data types, this could result in better memory adjacency, but that really depends on how you end up iterating.

use std::marker::PhantomData;
use std::fmt;

struct Left;
struct Right;
struct Empty;

struct AlternatingVec<L, R, P = Empty, S = Empty> {
    lefts: Vec<L>,
    rights: Vec<R>,
    prev: PhantomData<P>,
    start: PhantomData<S>,
}

impl<L, R> AlternatingVec<L, R, Empty, Empty> {
    pub fn new() -> Self {
        AlternatingVec {
            lefts: Vec::new(),
            rights: Vec::new(),
            prev: PhantomData,
            start: PhantomData,
        } 
    }
}

The types Left, Right and Empty are for "tagging" if the previous and start values correspond to values in the left or right collection, or if that collection is empty. Initially both collections are empty, so both P (the previously added value) and S (the start value) are Empty.

Next, a utility method for changing the types. It doesn't look like it does much, but we'll use it in combination with type inference to produce copies of the data structure, but with the phantom types changed.

impl<L, R, P, S> AlternatingVec<L, R, P, S> {    
    fn change_type<P2, S2>(self) -> AlternatingVec<L, R, P2, S2> {
        AlternatingVec {
            lefts: self.lefts,
            rights: self.rights,
            prev: PhantomData,
            start: PhantomData,
        } 
    }
}

In practice, the compiler is smart enough that this method does nothing at runtime.

These two traits define operations on the left and right collections respectively:

trait LeftMethods<L, R, S> {
    fn push_left(self, val: L) -> AlternatingVec<L, R, Left, S>;
}

trait RightMethods<L, R, S> {
    fn push_right(self, val: R) -> AlternatingVec<L, R, Right, S>;
}

We will implement those for the times we want them to be callable: RightMethods should only be available if the previous item was a "left" or if there are no items added so far. LeftMethods should be implemented if the previous items was a "right" or if there are no items added so far.

impl<L, R> LeftMethods<L, R, Left> for AlternatingVec<L, R, Empty, Empty> {
    fn push_left(mut self, val: L) -> AlternatingVec<L, R, Left, Left> {
        self.lefts.push(val);
        self.change_type()
    }
}

impl<L, R, S> LeftMethods<L, R, S> for AlternatingVec<L, R, Right, S> {
    fn push_left(mut self, val: L) -> AlternatingVec<L, R, Left, S> {
        self.lefts.push(val);
        self.change_type()
    }
}

impl<L, R> RightMethods<L, R, Right> for AlternatingVec<L, R, Empty, Empty> {
    fn push_right(mut self, val: R) -> AlternatingVec<L, R, Right, Right> {
        self.rights.push(val);
        self.change_type()
    }
}

impl<L, R, S> RightMethods<L, R, S> for AlternatingVec<L, R, Left, S> {
    fn push_right(mut self, val: R) -> AlternatingVec<L, R, Right, S> {
        self.rights.push(val);
        self.change_type()
    }
}

These methods don't do much except call push on the correct inner Vec, and then use change_type to make the type reflect the signature.

The compiler forces you to call push_left and push_right alternately:

fn main() {
    let v = AlternatingVec::new()
        .push_left(true)
        .push_right(7)
        .push_left(false)
        .push_right(0)
        .push_left(false);
}

This complex structure leads to a lot more work in general. For example, Debug is fiddly to implement in a nice way. I made a version with a Debug impl, but it's getting a bit too long for Stack Overflow. You can see it here:

  • https://gist.github.com/peterjoel/2ffe8b7f5ad7c649f61c580ac7dabc67
like image 26
Peter Hall Avatar answered Oct 29 '22 15:10

Peter Hall