I want to create a substring in Rust. It starts with an occurrence of a string and ends at the end of the string minus four characters or at a certain character.
My first approach was
string[string.find("pattern").unwrap()..string.len()-5]
That is wrong because Rust's strings are valid UTF-8 and thus byte and not char based.
My second approach is correct but too verbose:
let start_bytes = string.find("pattern").unwrap();
let mut char_byte_counter = 0;
let result = line.chars()
.skip_while(|c| {
char_byte_counter += c.len_utf8();
return start_bytes > char_byte_counter;
})
.take_while(|c| *c != '<')
.collect::<String>();
Are there simpler ways to create substrings? Is there any part of the standard library I did not find?
We can use the contains() method of Rust to check if a string contains a particular match or sub-string. It takes only one parameter, which is the sub-string or matches it wants to check.
Indexing into a string is often a bad idea because it's not clear what the return type of the string-indexing operation should be: a byte value, a character, a grapheme cluster, or a string slice. It's one of the reasons why the Rust compiler does not allows the direct access to characters in strings.
I don't remember a built-in library function in other languages that works exactly the way you want (give me the substring between two patterns, or between the first and the end if the second does not exist). I think you would have to write some custom logic anyway.
The closest equivalent to a "substring" function is slicing. However (as you found out) it works with bytes, not with unicode characters, so you will have to be careful with indices. In "Löwe"
, the 'e' is at (byte) index 4, not 3 (playground). But you can still use it in your case, because you are not working with indices directly (using find
instead to... find the index you need for you)
Here's how you could do it with slicing (bonus, you don't need to re-allocate other String
s):
// adding some unicode to check that everything works
// also ouside of ASCII
let line = "asdfapatterndf1老虎23<12";
let start_bytes = line.find("pattern").unwrap_or(0); //index where "pattern" starts
// or beginning of line if
// "pattern" not found
let end_bytes = line.find("<").unwrap_or(line.len()); //index where "<" is found
// or end of line
let result = &line[start_bytes..end_bytes]; //slicing line, returns patterndf1老虎23
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With