Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the byte offset between `&str`

Tags:

rust

I have two &str pointing to the same string, and I need to know the byte offset between them:

fn main() {
    let foo = "  bar";
    assert_eq!(offset(foo, foo.trim()), Some(2));

    let bar = "baz\nquz";
    let mut lines = bar.lines();
    assert_eq!(offset(bar, lines.next().unwrap()), Some(0));
    assert_eq!(offset(bar, lines.next().unwrap()), Some(4));

    assert_eq!(offset(foo, bar), None); // not a sub-string

    let quz = "quz".to_owned();
    assert_eq!(offset(bar, &quz), None); // not the same string, could also return `Some(4)`, I don't care
}

This is basically the same as str::find, but since the second slice is a sub-slice of the first, I would have hoped something faster. Also str::find won't work in the lines() case if several lines are identical.

I thought I could just use some pointer arithmetic to do that with something like foo.trim().as_ptr() - foo.as_ptr() but it turns out that Sub is not implemented on raw pointers.

like image 371
mcarton Avatar asked Jul 08 '16 13:07

mcarton


People also ask

What is byte offset value?

In computer science, offset describes the location of a piece of data compared to another location. For example, when a program is accessing an array of bytes, the fifth byte is offset by four bytes from the array's beginning.

What is byte offset in memory address?

the byte address / number of bytes per block. The cache block number = the memory block number modulo the number of blocks in the cache. The block offset (i.e., word offset) = the word address modulo the number of words per block.

How many bits is the offset address?

The offset field requires 14 bits to address 16 KB. That leaves 24 bits for the page fields. Since each entry is 4 bytes, one page can hold 16KB / 4 byte = 2^12 page table entries and therefore requires 12 bits to index one page.

What is bit offset?

Offset binary, also referred to as excess-K, excess-N, excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned number n + K , K being the biasing value or offset.


1 Answers

but it turns out that Sub is not implemented on raw pointers.

You can convert the pointer to a usize to do math on it:

fn main() {
    let source = "hello, world";
    let a = &source[1..];
    let b = &source[5..];
    let diff =  b.as_ptr() as usize - a.as_ptr() as usize;
    println!("{}", diff);
}

There's also the unstable method offset_from:

#![feature(ptr_offset_from)]

fn main() {
    let source = "hello, world";
    let a = &source[1..];
    let b = &source[5..];
    // I copied this unsafe code from Stack Overflow without
    // reading the text that told me how to know if this was safe
    let diff = unsafe { b.as_ptr().offset_from(a.as_ptr()) };
    println!("{}", diff);
}

Please be sure to read the documentation for this method as it describes under what circumstances it will not cause undefined behavior.

like image 68
Shepmaster Avatar answered Oct 07 '22 00:10

Shepmaster