Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate over collection. Drop it as soon as Iterator is dropped

I have collections dumped on disk. When requested, these collections should be retrieved (no problem) and an iterator should be build for it that returns references to the retrieved values.

After the iterator is dropped, I do not need the collection anymore. I want it to be dropped, too.

What I have tried so far:

  1. The Iterator owns the collection. This made the most sense for me, but it is not possible; I am not quite sure why. Some say the Iterator traits' method signature for next is the problem. (example)

  2. Reference Counting: The Retriever returns a Rc<Vec<usize>>. I ran into the same problems as in the owning iterator. (example)

  3. Letting the retriever own the collection and handing out a reference to it. I tried to implement the retriever with interior mutability (RefCell<HashMap>), but I cannot return references into the HashMap with long enough lifetimes.

I see two basic possibilities with this.

  1. The retriever transfers ownership. Then the Iterator would need to own the data. Something in the lines of:

    use std::slice::Iter;
    
    fn retrieve(id: usize) -> Vec<usize> {
        //Create Data out of the blue (or disk, or memory, or network. I dont care)
        //Move the data out. Transfer ownership
        let data = vec![0, 1, 2, 3];
        data
    }
    
    fn consume_iterator<'a, TIterator: Iterator<Item=&'a usize>>(iterator: TIterator) {
        for i in iterator {
            println!("{}", i);
        }
    }
    
    fn handler<'a>(id: usize) -> Iter<'a, usize> {
        //handle_request now owns the vector.
        //I now want to build an owning iterator..
        //This does of course not compile as vector will be dropped at the end of this method
        retrieve(id).iter()
    }
    
    fn main() {
        consume_iterator(handler(0))
    }
    
  2. The retriever owns the collection. But then two new problems arise:

    1. How can I drop the data when the iterator is out of range?
    2. How do I tell the borrow-checker that I will own the collection long enough?

    use std::cell::{Ref, RefCell};
    
    struct Retriever {
        //Own the data. But I want it to be dropped as soon as the references to it go out of scope.
        data: RefCell<Vec<usize>>
    }
    
    impl Retriever{
    
        fn retrieve<'a>(&'a self, id: usize) -> Ref<'a, Vec<usize>> {
            //Create Data out of the blue (or disk, or memory, or network. I dont care)
            //Now data can be stored internally and a referece to it can be supplied.
            let mut data = self.data.borrow_mut();
            *data = vec![0, 1, 2, 3];
            self.data.borrow()
        }
    
    }
    
    fn consume_iterator<'a, TIterator: Iterator<Item=&'a usize>>(iterator: TIterator) {
        for i in iterator {
            println!("{}", i);
        }
    }
    
    
    fn handler<'a>(ret: &'a Retriever, id: usize) -> IterWrapper<'a> {
        //andle_request now has a reference to the collection
        //So just call iter()? Nope. Lifetime issues.
        ret.retrieve(id).iter()        
    }
    
    fn main() {
        let retriever = Retriever{data: RefCell::new(Vec::new())};
        consume_iterator(handler(&retriever, 0))
    }
    

I feel a bit lost here and am overlooking something obvious.

like image 425
JDemler Avatar asked Jul 26 '16 09:07

JDemler


2 Answers

The Iterator owns the collection. [or joint ownership via reference-counting]

ContainerIterator { 
    data: data,
    iter: data.iter(),
}

No, you cannot have a value and a reference to that value in the same struct.

Letting the retriever own the collection and handing out a reference to it.

No, you cannot return references to items owned by the iterator.

As commenters have said, use IntoIter to transfer ownership of the items to the iterator and then hand them out as the iterated values:

use std::vec::IntoIter;

struct ContainerIterator {
    iter: IntoIter<usize>,
}

impl Iterator for ContainerIterator {
    type Item = usize;

    fn next(&mut self) -> Option<Self::Item> {
        self.iter.next()
    }
}

fn main() {
    let data = vec![0, 1, 2, 3];
    let cont = ContainerIterator { iter: data.into_iter() };

    for x in cont {
        println!("Hi {}", x)
    }
}

If you must return references... then you need to keep the thing that owns them around for the entire time that all the references might be around.

How can I drop the data when the iterator is out of range?

By not using the value any more:

fn main() {
    {
        let loaded_from_disk = vec![0, 1, 2, 3];
        for i in &loaded_from_disk {
            println!("{}", i)
        }
        // loaded_from_disk goes out of scope and is dropped. Nothing to *do*, per se.
    }
}

How do I tell the borrow-checker that I will own the collection long enough?

By owning the collection long enough. There's no secret handshake that the Rust Illuminati use with the borrow checker. The code only needs to be structured such that the thing that is borrowed doesn't become invalid while the borrow is outstanding. You can't move it (changing the memory address) or drop it (changing the memory address).

like image 52
Shepmaster Avatar answered Nov 19 '22 23:11

Shepmaster


I was now finally able to implement a relatively statisfying solution:

Hiding the mutability of iterators inside Cells:

pub trait OwningIterator<'a> {
    type Item;
    fn next(&'a self) -> Option<Self::Item>;
}

A struct now needs a Celld position to allow iteration without mutation. As an example here is the implementation of a struct that both owns and can iterate over a Arc<Vec<T>>:

pub struct ArcIter<T> {
    data: Arc<Vec<T>>,
    pos: Cell<usize>,
}

impl<'a, T: 'a> OwningIterator<'a> for ArcIter<T> {
    type Item = &'a T;

    fn next(&'a self) -> Option<Self::Item> {
        if self.pos.get() < self.data.len() {
            self.pos.set(self.pos.get() + 1);
            return Some(&self.data[self.pos.get() - 1]);
        }  
        None
    }
}

As I was able to hide these kind of iterators behind interfaces and let the user only handle "real" iterators I feel this is an acceptable deviation from the standard.

Thanks to everyone who contributed with ideas that ultimately helped me to find that solution.

like image 36
JDemler Avatar answered Nov 19 '22 23:11

JDemler