Suppose I'm trying to do a fancy zero-copy parser in Rust using &str
, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:
fn main() {
let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();
for t in v.iter_mut() {
if (t.contains("$world")) {
*t = &t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
But of course the String
returned by t.replace()
doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str
but if necessary a String
"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String
should be kept alive until the end of main()
(or have the same lifetime as v
)?
String is the dynamic heap string type, like Vec : use it when you need to own or modify your string data. str is an immutable1 sequence of UTF-8 bytes of dynamic length somewhere in memory. Since the size is unknown, one can only handle it behind a pointer.
str is a built-in function (actually a class) which converts its argument to a string. string is a module which provides common string operations. Put another way, str objects are a textual representation of some object o , often created by calling str(o) . These objects have certain methods defined on them.
A String is the Vec and str is the slice. Since a slice is its own type, we can borrow it to change or read as we please. This is the difference between str and &str in that you will only ever manipulate a &str but it's technically a borrowed "string slice" str .
According to the The Rust Reference 1, A string literal is a sequence of any Unicode characters enclosed within two U+0022 (double-quote) characters, with the exception of U+0022 itself 2. Escape characters in the string literal body are processed. The string body cannot contain a double-quote.
Rust has exactly what you want in form of a Cow
(Clone On Write) type.
use std::borrow::Cow;
fn main() {
let mut v: Vec<_> = "Hello there $world!".split_whitespace()
.map(|s| Cow::Borrowed(s))
.collect();
for t in v.iter_mut() {
if t.contains("$world") {
*t.to_mut() = t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
as @sellibitze correctly notes, the to_mut()
creates a new String
which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use
*t = Cow::Owned(t.replace("$world", "Earth"));
In case the Vec contains Cow::Owned
elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.
let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
let p = pos + last_pos; // find always starts at last_pos
last_pos = pos + 5;
unsafe {
let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
s.remove(p); // remove $ sign
for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
*sc = c;
}
}
}
Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With