I would like to split the output of an object that implements Iterator<(A,B)>
into two objects that implement Iterator<A>
and Iterator<B>
. Since one of the outputs could be iterated more than the other, I'll need to buffer up the output of the Iterator<(A,B)>
(because I can't rely on the Iterator<(A,B)>
being cloneable.) The problem is that the iterator could be infinite, so I can't simply collect the output of the iterator into two buffers and return iterators over the two buffers.
So it seems that I'll need to hold buffers of the A
and B
objects, and whenever one of the buffers is empty I'll fill it with samples from the Iterator<(A,B)>
object. This means that I'll need two iterable structs that have mutable references to the input iterator (since both of them will need to call next()
on the input to fill up the buffers), which is impossible.
So, is there any way to accomplish this in a safe way?
This is possible. As you identified you need mutable references to the base iterator from both handles, which is possible using a type with "internal mutability", that is, one that uses unsafe
code internally to expose a safe API for acquiring a &mut
to aliasable data (i.e. contained in a &
) by dynamically enforcing the invariants that the compiler normally enforces at compile time outside unsafe
.
I'm assuming you're happy to keep the two iterators on a single thread1, so, in this case, we want a RefCell
. We also need to be able to have access to the RefCell
from the two handles, entailing storing either a &RefCell<...>
or an Rc<RefCell<...>>
. The former would be too restrictive, as it would only allow us to use the pair of iterators in and below the stack frame in which the RefCell
is created, while we want to be able to freely pass the iterators around, so Rc
it is.
In summary, we're basically going to be storing an Rc<RefCell<Iterator<(A,B)>>>
, there's just the question of buffering. The right tool for the job here is a RingBuf
since we want efficient push/pop at the front and back. Thus, the thing we're sharing (i.e. inside the RefCell
) could look like:
struct SharedInner<A, B, It> {
iter: It,
first: RingBuf<A>,
second: RingBuf<B>,
}
We can abbreviate the type actually being shared as type Shared<A, B, It> = Rc<RefCell<SharedInner<A, B, It>>>;
, which allows us to define the iterators:
struct First<A, B, It> {
data: Shared<A, B, It>
}
impl Iterator<A> for First<A,B,It> {
fn next(&mut self) -> Option<A> {
// ...
}
}
To implement next
the first thing to do is get a &mut
to the SharedInner
, via self.data.borrow_mut();
. And then get an element out of it: check the right buffer, or otherwise get a new element from iter
(remembering to buffer the left-over B
):
let mut inner = self.data.borrow_mut();
inner.first.pop_front().or_else(|| {
inner.iter.next().map(|(a,b)| {
inner.second.push(b);
a
})
})
Docs: RingBuf.pop_front
, Option.or_else
.
The iterator for the other side is similar. In total:
use std::cell::RefCell;
use std::collections::{Deque, RingBuf};
use std::rc::Rc;
struct SharedInner<A, B, It> {
iter: It,
first: RingBuf<A>,
second: RingBuf<B>
}
type Shared<A, B, It> = Rc<RefCell<SharedInner<A, B, It>>>;
struct First<A, B, It> {
data: Shared<A, B, It>
}
impl<A,B, It: Iterator<(A,B)>> Iterator<A> for First<A, B, It> {
fn next(&mut self) -> Option<A> {
let mut inner = self.data.borrow_mut();
// try to get one from the stored data
inner.first.pop_front().or_else(||
// nothing stored, we need a new element.
inner.iter.next().map(|(a, b)| {
inner.second.push(b);
a
}))
}
}
struct Second<A, B, It> {
data: Shared<A, B, It>
}
impl<A,B, It: Iterator<(A,B)>> Iterator<B> for Second<A,B,It> {
fn next(&mut self) -> Option<B> {
let mut inner = self.data.borrow_mut();
inner.second.pop_front().or_else(|| {
inner.iter.next().map(|(a, b)| {
inner.first.push(a);
b
})
})
}
}
fn split<A, B, It: Iterator<(A,B)>>(it: It) -> (First<A, B, It>,
Second<A, B, It>) {
let data = Rc::new(RefCell::new(SharedInner {
iter: it,
first: RingBuf::new(),
second: RingBuf::new(),
}));
(First { data: data.clone() }, Second { data: data })
}
fn main() {
let pairs = range(1u32, 10 + 1).map(|x| (x, 1.0 / x as f64));
let (mut first, mut second) = split(pairs);
println!("first:");
for x in first.by_ref().take(3) {
println!(" {}", x);
}
println!("second:");
for y in second.by_ref().take(5) {
if y < 0.2 { break }
println!(" {}", y);
}
let a = first.collect::<Vec<u32>>();
let b = second.collect::<Vec<f64>>();
println!("a {}\nb {}", a, b);
}
which prints
first:
1
2
3
second:
1
0.5
0.333333
0.25
0.2
a [4, 5, 6, 7, 8, 9, 10]
b [0.166667, 0.142857, 0.125, 0.111111, 0.1]
playpen.
There's various ways this could be optimised, e.g. when fetching in First
, only buffer the left-over B
if a Second
handle exists.
1 If you were looking to run them in separate threads just replace the RefCell
with a Mutex
and the Rc
with an Arc
, and add the necessary bounds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With