Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the starting offset of a string slice of another string? [duplicate]

Tags:

string

rust

Given a string and a slice referring to some substring, is it possible to find the starting and ending index of the slice?

I have a ParseString function which takes in a reference to a string, and tries to parse it according to some grammar:

ParseString(inp_string: &str) -> Result<(), &str>

If the parsing is fine, the result is just Ok(()), but if there's some error, it usually is in some substring, and the error instance is Err(e), where e is a slice of that substring.

When given the substring where the error occurs, I want to say something like "Error from characters x to y", where x and y are the starting and ending indices of the erroneous substring.

I don't want to encode the position of the errors directly in Err, because I'm nesting these invocations, and the offsets in the nested slice might not correspond to the some slice in the top level string.

like image 929
sayantankhan Avatar asked Jun 10 '18 07:06

sayantankhan


2 Answers

As long as all of your string slices borrow from the same string buffer, you can calculate offsets with simple pointer arithmetic. You need the following methods:

  • str::as_ptr(): Returns the pointer to the start of the string slice
  • A way to get the difference between two pointers. Right now, the easiest way is to just cast both pointers to usize (which is always a no-op) and then subtract those. On nightly, there is an unstable method offset_from() which is slightly nicer.

Here is working code (Playground):

fn get_range(whole_buffer: &str, part: &str) -> (usize, usize) {
    let start = part.as_ptr() as usize - whole_buffer.as_ptr() as usize;
    let end = start + part.len();
    (start, end)
}

fn main() {
    let input = "Everyone ♥ Ümläuts!";

    let part1 = &input[1..7];
    println!("'{}' has offset {:?}", part1, get_range(input, part1));

    let part2 = &input[7..16];
    println!("'{}' has offset {:?}", part2, get_range(input, part2));
}
like image 175
Lukas Kalbertodt Avatar answered Oct 10 '22 01:10

Lukas Kalbertodt


Rust actually used to have an unstable method for doing exactly this, but it was removed due to being obsolete, which was a bit odd considering the replacement didn't remotely have the same functionality.

That said, the implementation isn't that big, so you can just add the following to your code somewhere:

pub trait SubsliceOffset {
    /**
    Returns the byte offset of an inner slice relative to an enclosing outer slice.

    Examples

    ```ignore
    let string = "a\nb\nc";
    let lines: Vec<&str> = string.lines().collect();
    assert!(string.subslice_offset_stable(lines[0]) == Some(0)); // &"a"
    assert!(string.subslice_offset_stable(lines[1]) == Some(2)); // &"b"
    assert!(string.subslice_offset_stable(lines[2]) == Some(4)); // &"c"
    assert!(string.subslice_offset_stable("other!") == None);
    ```
    */
    fn subslice_offset_stable(&self, inner: &Self) -> Option<usize>;
}

impl SubsliceOffset for str {
    fn subslice_offset_stable(&self, inner: &str) -> Option<usize> {
        let self_beg = self.as_ptr() as usize;
        let inner = inner.as_ptr() as usize;
        if inner < self_beg || inner > self_beg.wrapping_add(self.len()) {
            None
        } else {
            Some(inner.wrapping_sub(self_beg))
        }
    }
}

You can remove the _stable suffix if you don't need to support old versions of Rust; it's just there to avoid a name conflict with the now-removed subslice_offset method.

like image 36
DK. Avatar answered Oct 10 '22 00:10

DK.