Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using str and String interchangably

Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:

fn main() {
    let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();

    for t in v.iter_mut() {
        if (t.contains("$world")) {
            *t = &t.replace("$world", "Earth");
        }
    }

    println!("{:?}", &v);
}

But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?

like image 475
Timmmm Avatar asked Jul 06 '15 07:07

Timmmm


People also ask

Is str the same as string?

String is the dynamic heap string type, like Vec : use it when you need to own or modify your string data. str is an immutable1 sequence of UTF-8 bytes of dynamic length somewhere in memory. Since the size is unknown, one can only handle it behind a pointer.

What is the difference between STR and string in Python?

str is a built-in function (actually a class) which converts its argument to a string. string is a module which provides common string operations. Put another way, str objects are a textual representation of some object o , often created by calling str(o) . These objects have certain methods defined on them.

Why does Rust have two string types?

A String is the Vec and str is the slice. Since a slice is its own type, we can borrow it to change or read as we please. This is the difference between str and &str in that you will only ever manipulate a &str but it's technically a borrowed "string slice" str .

What is a string literal Rust?

According to the The Rust Reference 1, A string literal is a sequence of any Unicode characters enclosed within two U+0022 (double-quote) characters, with the exception of U+0022 itself 2. Escape characters in the string literal body are processed. The string body cannot contain a double-quote.


1 Answers

Rust has exactly what you want in form of a Cow (Clone On Write) type.

use std::borrow::Cow;

fn main() {
    let mut v: Vec<_> = "Hello there $world!".split_whitespace()
                                             .map(|s| Cow::Borrowed(s))
                                             .collect();

    for t in v.iter_mut() {
        if t.contains("$world") {
            *t.to_mut() = t.replace("$world", "Earth");
        }
    }

    println!("{:?}", &v);
}

as @sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use

*t = Cow::Owned(t.replace("$world", "Earth"));

In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.

let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
    let p = pos + last_pos; // find always starts at last_pos
    last_pos = pos + 5;
    unsafe {
        let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
        s.remove(p); // remove $ sign
        for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
            *sc = c;
        }
    }
}

Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.

like image 68
oli_obk Avatar answered Sep 19 '22 02:09

oli_obk