Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get a substring between two patterns in Rust?

Tags:

string

rust

I want to create a substring in Rust. It starts with an occurrence of a string and ends at the end of the string minus four characters or at a certain character.

My first approach was

string[string.find("pattern").unwrap()..string.len()-5]

That is wrong because Rust's strings are valid UTF-8 and thus byte and not char based.

My second approach is correct but too verbose:

   let start_bytes = string.find("pattern").unwrap();
   let mut char_byte_counter = 0;
   let result = line.chars()
    .skip_while(|c| {
        char_byte_counter += c.len_utf8();
        return start_bytes > char_byte_counter;
    })
    .take_while(|c| *c != '<')
    .collect::<String>();

Are there simpler ways to create substrings? Is there any part of the standard library I did not find?

like image 496
JDemler Avatar asked Jun 13 '16 07:06

JDemler


People also ask

How do you get a substring in Rust?

We can use the contains() method of Rust to check if a string contains a particular match or sub-string. It takes only one parameter, which is the sub-string or matches it wants to check.

Can you index a string in Rust?

Indexing into a string is often a bad idea because it's not clear what the return type of the string-indexing operation should be: a byte value, a character, a grapheme cluster, or a string slice. It's one of the reasons why the Rust compiler does not allows the direct access to characters in strings.


1 Answers

I don't remember a built-in library function in other languages that works exactly the way you want (give me the substring between two patterns, or between the first and the end if the second does not exist). I think you would have to write some custom logic anyway.

The closest equivalent to a "substring" function is slicing. However (as you found out) it works with bytes, not with unicode characters, so you will have to be careful with indices. In "Löwe", the 'e' is at (byte) index 4, not 3 (playground). But you can still use it in your case, because you are not working with indices directly (using find instead to... find the index you need for you)

Here's how you could do it with slicing (bonus, you don't need to re-allocate other Strings):

// adding some unicode to check that everything works
// also ouside of ASCII
let line = "asdfapatterndf1老虎23<12";

let start_bytes = line.find("pattern").unwrap_or(0); //index where "pattern" starts
                                                     // or beginning of line if 
                                                     // "pattern" not found
let end_bytes = line.find("<").unwrap_or(line.len()); //index where "<" is found
                                                      // or end of line

let result = &line[start_bytes..end_bytes]; //slicing line, returns patterndf1老虎23
like image 66
Paolo Falabella Avatar answered Sep 24 '22 03:09

Paolo Falabella