Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't this a dangling pointer?

Tags:

rust

I am working may way through the Rust book, and it has this code snippet:

fn first_word(s: &String) -> usize {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return i;
        }
    }

    s.len()
}

In the previous chapter, it was explained that

  • the rust compiler prevents you from retaining references into objects after they go out of scope (a dangling reference), and
  • variables go out of scope at the last moment they are mentioned (the example given of this showed the creation of both an immutable and mutable reference to the same object in the same block by ensuring that the immutable reference was not mentioned after the creation of the mutable reference).

To me, it looks like bytes is not referenced after the for line header (presumably the code associated with bytes.iter().enumerate() is executed just once before the loop starts, not on every loop iteration), so &item shouldn't be allowed to be a reference into any part of bytes. But I don't see any other object (is "object" the right rust terminology?) that it could be a reference into.

It's true that s is still in scope, but, well... does the compiler even remember the connection between bytes and s by the time the for loop rolls around? Indeed, even if I change the function to accept bytes directly, the compiler thinks things are hunky-dory:

fn first_word(bytes: &[u8]) -> usize {
    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return i+1;
        }
    }

    0
}

Here no variables other than i or item is mentioned after the for loop header, so it really seems like &item can't be a reference into anything!

I looked at this very similarly-titled question. The comments and answers there suggest that there might be a lifetime argument which is keeping bytes alive, and proposed a way to ask the compiler what it thinks the type/lifetime is. I haven't learned about lifetimes yet, so I'm fumbling about in the dark a little, but I tried:

fn first_word<'x_lifetime>(s: &String) -> usize {
    let bytes = s.as_bytes();
    let x_variable: &'x_lifetime () = bytes;

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return i+1;
        }
    }

    0
}

However, the error, while indeed indicating that bytes is a &[u8], which I understand, also seems to imply there's no extra lifetime information associated with bytes. I say this because the error includes lifetime information in the "expected" part but not in the "found" part. Here it is:

error[E0308]: mismatched types
 --> test.rs:3:39
  |
3 |     let x_variable: &'x_lifetime () = bytes;
  |                     ---------------   ^^^^^ expected `()`, found slice `[u8]`
  |                     |
  |                     expected due to this
  |
  = note: expected reference `&'x_lifetime ()`
             found reference `&[u8]`

So what is going on here? Obviously some part of my reasoning is off, but what? Why isn't item a dangling reference?

like image 838
Daniel Wagner Avatar asked Sep 01 '21 14:09

Daniel Wagner


Video Answer


1 Answers

Here is a version of the first function with all the lifetimes and types included, and the for loop replaced with an equivalent while let loop.

fn first_word<'a>(s: &'a String) -> usize {
    let bytes: &'a [u8] = s.as_bytes();

    let mut iter: std::iter::Enumerate<std::slice::Iter<'a, u8>>
        = bytes.iter().enumerate();

    while let Some(iter_item) = iter.next() {
        let (i, &item): (usize, &'a u8) = iter_item;
        if item == b' ' {
            return i;
        }
    }

    s.len()
}

Things to notice here:

  • The type of the iterator produced by bytes.iter().enumerate() has a lifetime parameter which ensures the iterator does not outlive the [u8] it iterates over. (Note that the &[u8] can go away — it isn't needed. What matters is that its referent, the bytes inside the String, stays alive. So, we only really need to think about one lifetime 'a, not separate lifetimes for s and bytes, because there's only one byte-slice inside the String that we're referring to in different ways.
  • iter — an explicit variable corresponding to the implicit action of for — is used in every iteration of the loop.

I see another misunderstanding, not exactly about the lifetimes and scope:

there might be a lifetime argument which is keeping bytes alive,

Lifetimes never keep something alive. Lifetimes never affect the execution of the program; they never affect when something is dropped or deallocated. A lifetime in some type such as &'a u8 is a compile-time claim that values of that type will be valid for that lifetime. Changing the lifetimes in a program changes only what is to be proven (checked) by the compiler, not what is true about the program.

like image 170
Kevin Reid Avatar answered Oct 07 '22 15:10

Kevin Reid