Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where is Rust storing all these bytes?

Tags:

rust

In trying to understand how stack memory works, I wrote the following code to display addresses of where data gets stored:

fn main() {
    let a = "0123456789abcdef0";
    let b = "123456789abcdef01";
    let c = "23456789abcdef012";

    println!("{:p} {}", &a, a.len());
    println!("{:p} {}", &b, b.len());
    println!("{:p} {}", &c, c.len());
}

The output is:

0x7fff288a5448 17
0x7fff288a5438 17
0x7fff288a5428 17

It implies that all 17 bytes are stored in a space of 16 bytes, which can't be right. My one guess is that there's some optimization happening, but I get the same results even when I build with --opt-level 0.

The equivalent C seems to do the right thing:

#include <stdio.h>
#include <string.h>

int main() {
    char a[] = "0123456789abcdef";
    char b[] = "123456789abcdef0";
    char c[] = "23456789abcdef01";

    printf("%p %zu\n", &a, strlen(a) + 1);
    printf("%p %zu\n", &b, strlen(b) + 1);
    printf("%p %zu\n", &c, strlen(c) + 1);

    return 0;
}

Output:

0x7fff5837b440 17
0x7fff5837b420 17
0x7fff5837b400 17
like image 502
tshepang Avatar asked Aug 24 '14 04:08

tshepang


1 Answers

String literals "..." are stored in static memory, and the variables a, b, c are just (fat) pointers to them. They have type &str, which has the following layout:

struct StrSlice {
    data: *const u8,
    length: uint
}

where the data field points at the sequence of bytes that form the text, and the length field says how many bytes there are.

On a 64-bit platform this is 16-bytes (and on a 32-bit platform, 8 bytes). The real equivalent in C (ignoring null termination vs. stored length) would be storing into a const char* instead of a char[], changing the C to this prints:

0x7fff21254508 17
0x7fff21254500 17
0x7fff212544f8 17

i.e. the pointers are 8 bytes apart.

You can check these low-level details using --emit=asm or --emit=llvm-ir, or clicking the corresponding button on the playpen (possibly adjusting the optimisation level too). E.g.

fn main() {
    let a = "0123456789abcdef0";
}

compiled with --emit=llvm-ir and no optimisations gives (with my trimming and annotations):

%str_slice = type { i8*, i64 }

;; global constant with the string's text
@str1042 = internal constant [17 x i8] c"0123456789abcdef0"

; Function Attrs: uwtable
define internal void @_ZN4main20h55efe3c71b4bb8f4eaaE() unnamed_addr #0 {
entry-block:
  ;; create stack space for the `a` variable
  %a = alloca %str_slice

  ;; get a pointer to the first element of the `a` struct (`data`)...
  %0 = getelementptr inbounds %str_slice* %a, i32 0, i32 0
  ;; ... and store the pointer to the string data in it
  store i8* getelementptr inbounds ([17 x i8]* @str1042, i32 0, i32 0), i8** %0

  ;; get a pointer to the second element of the `a` struct (`length`)...
  %1 = getelementptr inbounds %str_slice* %a, i32 0, i32 1
  ;; ... and store the length of the string (17) in it.
  store i64 17, i64* %1
  ret void
}
like image 126
huon Avatar answered Sep 28 '22 07:09

huon