Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I store a Chars iterator in the same struct as the String it is iterating on?

Tags:

rust

I am just beginning to learn Rust and I’m struggling to handle the lifetimes.

I’d like to have a struct with a String in it which will be used to buffer lines from stdin. Then I’d like to have a method on the struct which returns the next character from the buffer, or if all of the characters from the line have been consumed it will read the next line from stdin.

The documentation says that Rust strings aren’t indexable by character because that is inefficient with UTF-8. As I’m accessing the characters sequentially it should be fine to use an iterator. However, as far as I understand, iterators in Rust are tied to the lifetime of the thing they’re iterating and I can’t work out how I could store this iterator in the struct alongside the String.

Here is the pseudo-Rust that I’d like to achieve. Obviously it doesn’t compile.

struct CharGetter {
    /* Buffer containing one line of input at a time */
    input_buf: String,
    /* The position within input_buf of the next character to
     * return. This needs a lifetime parameter. */
    input_pos: std::str::Chars
}

impl CharGetter {
    fn next(&mut self) -> Result<char, io::Error> {
        loop {
            match self.input_pos.next() {
                /* If there is still a character left in the input
                 * buffer then we can just return it immediately. */
                Some(n) => return Ok(n),
                /* Otherwise get the next line */
                None => {
                    io::stdin().read_line(&mut self.input_buf)?;
                    /* Reset the iterator to the beginning of the
                     * line. Obviously this doesn’t work because it’s
                     * not obeying the lifetime of input_buf */
                    self.input_pos = self.input_buf.chars();
                }
            }
        }
    }
}

I am trying to do the Synacor challenge. This involves implementing a virtual machine where one of the opcodes reads a character from stdin and stores it in a register. I have this part working fine. The documentation states that whenever the program inside the VM reads a character it will keep reading until it reads a whole line. I wanted to take advantage of this to add a “save” command to my implementation. That means that whenever the program asks for a character, I will read a line from the input. If the line is “save”, I will save the state of the VM and then continue to get another line to feed to the VM. Each time the VM executes the input opcode, I need to be able to give it one character at a time from the buffered line until the buffer is depleted.

My current implementation is here. My plan was to add input_buf and input_pos to the Machine struct which represents the state of the VM.

like image 900
Neil Roberts Avatar asked May 13 '17 10:05

Neil Roberts


1 Answers

As thoroughly described in Why can't I store a value and a reference to that value in the same struct?, in general you can't do this because it truly is unsafe. When you move memory, you invalidate references. This is why a lot of people use Rust - to not have invalid references which lead to program crashes!

Let's look at your code:

io::stdin().read_line(&mut self.input_buf)?;
self.input_pos = self.input_buf.chars();

Between these two lines, you've left self.input_pos in a bad state. If a panic occurs, then the destructor of the object has the opportunity to access invalid memory! Rust is protecting you from an issue that most people never think about.


As also described in that answer:

There is a special case where the lifetime tracking is overzealous: when you have something placed on the heap. This occurs when you use a Box<T>, for example. In this case, the structure that is moved contains a pointer into the heap. The pointed-at value will remain stable, but the address of the pointer itself will move. In practice, this doesn't matter, as you always follow the pointer.

Some crates provide ways of representing this case, but they require that the base address never move. This rules out mutating vectors, which may cause a reallocation and a move of the heap-allocated values.

Remember that a String is just a vector of bytes with extra preconditions added.

Instead of using one of those crates, we can also roll our own solution, which means we (read you) get to accept all the responsibility for ensuring that we aren't doing anything wrong.

The trick here is to ensure that the data inside the String never moves and no accidental references are taken.

use std::{mem, str::Chars};

/// I believe this struct to be safe because the String is
/// heap-allocated (stable address) and will never be modified
/// (stable address). `chars` will not outlive the struct, so
/// lying about the lifetime should be fine.
///
/// TODO: What about during destruction?
///       `Chars` shouldn't have a destructor...
struct OwningChars {
    _s: String,
    chars: Chars<'static>,
}

impl OwningChars {
    fn new(s: String) -> Self {
        let chars = unsafe { mem::transmute(s.chars()) };
        OwningChars { _s: s, chars }
    }
}

impl Iterator for OwningChars {
    type Item = char;
    fn next(&mut self) -> Option<Self::Item> {
        self.chars.next()
    }
}

You might even think about putting just this code into a module so that you can't accidentally muck about with the innards.


Here's the same code using the ouroboros crate to create a self-referential struct containing the String and a Chars iterator:

use ouroboros::self_referencing; // 0.4.1
use std::str::Chars;

#[self_referencing]
pub struct IntoChars {
    string: String,
    #[borrows(string)]
    chars: Chars<'this>,
}

// All these implementations are based on what `Chars` implements itself

impl Iterator for IntoChars {
    type Item = char;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        self.with_mut(|me| me.chars.next())
    }

    #[inline]
    fn count(mut self) -> usize {
        self.with_mut(|me| me.chars.count())
    }

    #[inline]
    fn size_hint(&self) -> (usize, Option<usize>) {
        self.with(|me| me.chars.size_hint())
    }

    #[inline]
    fn last(mut self) -> Option<Self::Item> {
        self.with_mut(|me| me.chars.last())
    }
}

impl DoubleEndedIterator for IntoChars {
    #[inline]
    fn next_back(&mut self) -> Option<Self::Item> {
        self.with_mut(|me| me.chars.next_back())
    }
}

impl std::iter::FusedIterator for IntoChars {}

// And an extension trait for convenience

trait IntoCharsExt {
    fn into_chars(self) -> IntoChars;
}

impl IntoCharsExt for String {
    fn into_chars(self) -> IntoChars {
        IntoCharsBuilder {
            string: self,
            chars_builder: |s| s.chars(),
        }
        .build()
    }
}

Here's the same code using the rental crate to create a self-referential struct containing the String and a Chars iterator:

#[macro_use]
extern crate rental; // 0.5.5

rental! {
    mod into_chars {
        pub use std::str::Chars;

        #[rental]
        pub struct IntoChars {
            string: String,
            chars: Chars<'string>,
        }
    }
}

use into_chars::IntoChars;

// All these implementations are based on what `Chars` implements itself

impl Iterator for IntoChars {
    type Item = char;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        self.rent_mut(|chars| chars.next())
    }

    #[inline]
    fn count(mut self) -> usize {
        self.rent_mut(|chars| chars.count())
    }

    #[inline]
    fn size_hint(&self) -> (usize, Option<usize>) {
        self.rent(|chars| chars.size_hint())
    }

    #[inline]
    fn last(mut self) -> Option<Self::Item> {
        self.rent_mut(|chars| chars.last())
    }
}

impl DoubleEndedIterator for IntoChars {
    #[inline]
    fn next_back(&mut self) -> Option<Self::Item> {
        self.rent_mut(|chars| chars.next_back())
    }
}

impl std::iter::FusedIterator for IntoChars {}

// And an extension trait for convenience

trait IntoCharsExt {
    fn into_chars(self) -> IntoChars;
}

impl IntoCharsExt for String {
    fn into_chars(self) -> IntoChars {
        IntoChars::new(self, |s| s.chars())
    }
}
like image 153
Shepmaster Avatar answered Oct 27 '22 23:10

Shepmaster