I have a set of data that alternates between A and B. These are all valid choices:
A -> B -> A
A -> B -> A -> B
B -> A -> B
B -> A -> B -> A
I want to leverage the type system to make sure the alternating property is checked at compile time while maintaining good performance.
struct A {
// data
next: Option<B>,
}
struct B {
// data
next: Option<Box<A>>,
}
The problem is that the performance of this data structure will be poor at best. Linked lists have frequent cache misses, and for iterating the data structure this is quite bad.
enum Types {
A(DataA),
B(DataB),
}
type Data = Vec<Types>;
With this solution, cache locality is much better, so yay for performance. However, this does not prevent putting 2 A
s side-by-side. There is also the fact that one needs to check the type at each iteration, while it is not needed because of the informal definition.
struct A {
// data, default in first link = empty
b: Option<B>,
}
struct B {
// data
}
type Data = Vec<A>;
This combines the cache locality of the Vec
with the type verification of the linked list. It is quite ugly, and one needs to check the first value to verify if it really is an A
, or an empty container for the next B
.
Is there a data structure that allows compile-time type verification, while maintaining cache locality and avoiding extra allocation?
To store alternating types in a way that the type system enforces and has reasonable efficiency, you can use a tuple: Vec<(X, Y)>
.
Your situation also requires
Option
to handle starting with Y
Option
to handle ending with X
use either::Either; // 1.5.2
use std::iter;
#[derive(Debug, Default)]
struct Data<X, Y> {
head: Option<Y>,
pairs: Vec<(X, Y)>,
tail: Option<X>,
}
impl<X, Y> Data<X, Y> {
fn iter(&self) -> impl Iterator<Item = Either<&X, &Y>> {
let head = self.head.iter().map(Either::Right);
let pairs = self.pairs.iter().flat_map(|(a, b)| {
let a = iter::once(Either::Left(a));
let b = iter::once(Either::Right(b));
a.chain(b)
});
let tail = self.tail.iter().map(Either::Left);
head.chain(pairs).chain(tail)
}
}
That being said, you are going to have ergonomic issues somewhere. For example, you can't just push
an Either<X, Y>
because the previously pushed value might be of the same type. Creating the entire structure at once might be the simplest direction:
#[derive(Debug)]
struct A;
#[derive(Debug)]
struct B;
fn main() {
let data = Data {
head: Some(B),
pairs: vec![(A, B)],
tail: None,
};
println!("{:?}", data.iter().collect::<Vec<_>>());
}
Is there a data structure that allows compile-time type verification, while maintaining cache locality and avoiding extra allocation?
You can use Rust's type system to enforce that items of each type are added in alternating order. The general strategy is to capture the type of the first item and also the previous item in the type of the whole structure and make different methods available according to the "current" type. When the previous item was an X
, only the methods for adding a Y
will be available, and vice versa.
I'm using two Vec
s rather than a Vec
of tuples. Depending on your data types, this could result in better memory adjacency, but that really depends on how you end up iterating.
use std::marker::PhantomData;
use std::fmt;
struct Left;
struct Right;
struct Empty;
struct AlternatingVec<L, R, P = Empty, S = Empty> {
lefts: Vec<L>,
rights: Vec<R>,
prev: PhantomData<P>,
start: PhantomData<S>,
}
impl<L, R> AlternatingVec<L, R, Empty, Empty> {
pub fn new() -> Self {
AlternatingVec {
lefts: Vec::new(),
rights: Vec::new(),
prev: PhantomData,
start: PhantomData,
}
}
}
The types Left
, Right
and Empty
are for "tagging" if the previous and start values correspond to values in the left or right collection, or if that collection is empty. Initially both collections are empty, so both P
(the previously added value) and S
(the start value) are Empty
.
Next, a utility method for changing the types. It doesn't look like it does much, but we'll use it in combination with type inference to produce copies of the data structure, but with the phantom types changed.
impl<L, R, P, S> AlternatingVec<L, R, P, S> {
fn change_type<P2, S2>(self) -> AlternatingVec<L, R, P2, S2> {
AlternatingVec {
lefts: self.lefts,
rights: self.rights,
prev: PhantomData,
start: PhantomData,
}
}
}
In practice, the compiler is smart enough that this method does nothing at runtime.
These two traits define operations on the left and right collections respectively:
trait LeftMethods<L, R, S> {
fn push_left(self, val: L) -> AlternatingVec<L, R, Left, S>;
}
trait RightMethods<L, R, S> {
fn push_right(self, val: R) -> AlternatingVec<L, R, Right, S>;
}
We will implement those for the times we want them to be callable: RightMethods
should only be available if the previous item was a "left" or if there are no items added so far. LeftMethods
should be implemented if the previous items was a "right" or if there are no items added so far.
impl<L, R> LeftMethods<L, R, Left> for AlternatingVec<L, R, Empty, Empty> {
fn push_left(mut self, val: L) -> AlternatingVec<L, R, Left, Left> {
self.lefts.push(val);
self.change_type()
}
}
impl<L, R, S> LeftMethods<L, R, S> for AlternatingVec<L, R, Right, S> {
fn push_left(mut self, val: L) -> AlternatingVec<L, R, Left, S> {
self.lefts.push(val);
self.change_type()
}
}
impl<L, R> RightMethods<L, R, Right> for AlternatingVec<L, R, Empty, Empty> {
fn push_right(mut self, val: R) -> AlternatingVec<L, R, Right, Right> {
self.rights.push(val);
self.change_type()
}
}
impl<L, R, S> RightMethods<L, R, S> for AlternatingVec<L, R, Left, S> {
fn push_right(mut self, val: R) -> AlternatingVec<L, R, Right, S> {
self.rights.push(val);
self.change_type()
}
}
These methods don't do much except call push
on the correct inner Vec
, and then use change_type
to make the type reflect the signature.
The compiler forces you to call push_left
and push_right
alternately:
fn main() {
let v = AlternatingVec::new()
.push_left(true)
.push_right(7)
.push_left(false)
.push_right(0)
.push_left(false);
}
This complex structure leads to a lot more work in general. For example, Debug
is fiddly to implement in a nice way. I made a version with a Debug
impl, but it's getting a bit too long for Stack Overflow. You can see it here:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With