Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get pointer offset in bytes?

While raw pointers in Rust have the offset method, this only increments by the size of the pointer. How can I get access to the pointer in bytes?

Something like this in C:

var_offset = (typeof(var))((char *)(var) + offset);
like image 790
ideasman42 Avatar asked Oct 28 '16 17:10

ideasman42


People also ask

How many bytes is an offset?

In computer science, offset describes the location of a piece of data compared to another location. For example, when a program is accessing an array of bytes, the fifth byte is offset by four bytes from the array's beginning.

Do pointers point to bytes?

Yes, technically, there would be four addressable bytes for the int you describe. But the pointer points to the first byte, and reading an int from it reads that byte and the subsequent three bytes to construct the int value.

Why size of pointer is 2 byte?

As we already know, the size of the pointer in C is dependent only on the word size of a particular system. So, the size of a pointer to a pointer should have the usual values, that is, 2 bytes for a 16-bit machine, 4 bytes for a 32-bit machine, and 8 bytes for a 64-bit machine.

Is Size Of pointer always 4 bytes?

Size of a pointer is fixed for a compiler. All pointer types take same number of bytes for a compiler. That is why we get 4 for both ptri and ptrc.


2 Answers

TL;DR: This answer invokes Undefined Behavior, according to RFC-2582.

In particular, references must be aligned and dereferencable, even when they are created and never used.

There are also discussions that field accesses themselves impose extra requirements not solved by the proposed &raw, due to usage of getelementptr inbounds, see offsetof woes at the bottom of the RFC.


From the answer I linked to your previous question:

macro_rules! offset_of {
    ($ty:ty, $field:ident) => {
        //  Undefined Behavior: dereferences a null pointer.
        //  Undefined Behavior: accesses field outside of valid memory area.
        unsafe { &(*(0 as *const $ty)).$field as *const _ as usize }
    }
}

fn main() {
    let p: *const Baz = 0x1248 as *const _;
    let p2: *const Foo = ((p as usize) - offset_of!(Foo, memberB)) as *const _;
    println!("{:p}", p2);
}

We can see on the computation of p2 that a pointer can be converted painless to an integer (usize here), on which arithmetic is performed, and then the result is cast back to a pointer.

isize and usize are the universal byte-sized pointer types :)


Were RFC-2582 to be accepted, this implementation of offset_of! is my best shot:

macro_rules! offset_of {
    ($ty:ty, $field:ident) => {
        unsafe {
            //  Create correctly sized storage.
            //
            //  Note: `let zeroed: $ty = ::std::mem::zeroed();` is incorrect,
            //        a zero pattern is not always a valid value.
            let buffer = ::std::mem::MaybeUninit::<$ty>::uninit();

            //  Create a Raw reference to the storage:
            //  - Alignment does not matter, though is correct here.
            //  - It safely refers to uninitialized storage.
            //
            //  Note: using `&raw const *(&buffer as *const _ as *const $ty)`
            //        is incorrect, it creates a temporary non-raw reference.
            let uninit: &raw const %ty = ::std::mem::transmute(&buffer);

            //  Create a Raw reference to the field:
            //  - Alignment does not matter, though is correct here.
            //  - It points within the memory area.
            //  - It safely refers to uninitialized storage.
            let field = &raw const uninit.$field;

            //  Compute the difference between pointers.
            (field as *const _ as usize) - (uninit as *const_ as usize)
        }
    }
}

I have commented each step with the reasons I believe they are sound, and why some alternatives are not -- something I encourage heavily in unsafe code -- and hopefully not missed anything.

like image 156
Matthieu M. Avatar answered Sep 20 '22 10:09

Matthieu M.


Thanks to @Matthieu M.'s answer, this can be done using pointer offsets, heres a reusable macro:

macro_rules! offset_of {
    ($ty:ty, $field:ident) => {
        &(*(0 as *const $ty)).$field as *const _ as usize
    }
}

macro_rules! check_type_pair {
    ($a:expr, $b:expr) => {
        if false {
            let _type_check = if false {$a} else {$b};
        }
    }
}

macro_rules! parent_of_mut {
    ($child:expr, $ty:ty, $field:ident) => {
        {
            check_type_pair!(&(*(0 as *const $ty)).$field, &$child);
            let offset = offset_of!($ty, $field);
            &mut *(((($child as *mut _) as usize) - offset) as *mut $ty)
        }
    }
}

macro_rules! parent_of {
    ($child:expr, $ty:ty, $field:ident) => {
        {
            check_type_pair!(&(*(0 as *const $ty)).$field, &$child);
            let offset = offset_of!($ty, $field);
            &*(((($child as *const _) as usize) - offset) as *const $ty)
        }
    }
}

This way, when we have a field in a struct, we can get the parent struct like this:

fn some_method(&self) {
    // Where 'self' is ParentStruct.field,
    // access ParentStruct instance.
    let parent = unsafe { parent_of!(self, ParentStruct, field) };
}

The macro check_type_pair helps avoid simple mistakes where self and ParentStruct.field aren't the same type. However its not foolproof when two different members in a struct have the same type.

like image 29
ideasman42 Avatar answered Sep 21 '22 10:09

ideasman42