Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Rust retrieve the input argc and argv values from a running program?

Tags:

rust

I know that the Rust application initialization entry is dynamically generated by rustc. And I inspected the code at compiler/rustc_codegen_ssa/src/base.rs which the part of it is shown as below.

fn create_entry_fn<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
    cx: &'a Bx::CodegenCx,
    rust_main: Bx::Value,
    rust_main_def_id: DefId,
    use_start_lang_item: bool,
) -> Bx::Function {
    // The entry function is either `int main(void)` or `int main(int argc, char **argv)`,
    // depending on whether the target needs `argc` and `argv` to be passed in.
    let llfty = if cx.sess().target.main_needs_argc_argv {
        cx.type_func(&[cx.type_int(), cx.type_ptr_to(cx.type_i8p())], cx.type_int())
    } else {
        cx.type_func(&[], cx.type_int())
    };

And what I found in the same file was really interesting as what I showed below, here from the comment, we can understand that Rust is collecting the input argc and argv at this place, and all these two parameters will be passed into the lang_start function later if I understand correctly.

/// Obtain the `argc` and `argv` values to pass to the rust start function.
fn get_argc_argv<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
    cx: &'a Bx::CodegenCx,
    bx: &mut Bx,
) -> (Bx::Value, Bx::Value) {
    if cx.sess().target.main_needs_argc_argv {
        // Params from native `main()` used as args for rust start function
        let param_argc = bx.get_param(0);
        let param_argv = bx.get_param(1);
        let arg_argc = bx.intcast(param_argc, cx.type_isize(), true);
        let arg_argv = param_argv;
        (arg_argc, arg_argv)
    } else {
        // The Rust start function doesn't need `argc` and `argv`, so just pass zeros.
        let arg_argc = bx.const_int(cx.type_int(), 0);
        let arg_argv = bx.const_null(cx.type_ptr_to(cx.type_i8p()));
        (arg_argc, arg_argv)
    }
}

But I also found another place where seems to do the same thing as what I've showed above at library/std/src/sys/unix/args.rs. For example, if you run a Rust app on Macos, seems Rust will use two FFI functions (_NSGetArgc / _NSGetArgv) to retrieve the argc and argv:

#[cfg(any(target_os = "macos", target_os = "ios"))]
mod imp {
    use super::Args;
    use crate::ffi::CStr;

    pub unsafe fn init(_argc: isize, _argv: *const *const u8) {}

    pub fn cleanup() {}

    #[cfg(target_os = "macos")]
    pub fn args() -> Args {
        use crate::os::unix::prelude::*;
        extern "C" {
            // These functions are in crt_externs.h.
            fn _NSGetArgc() -> *mut libc::c_int;
            fn _NSGetArgv() -> *mut *mut *mut libc::c_char;
        }

        let vec = unsafe {
            let (argc, argv) =
                (*_NSGetArgc() as isize, *_NSGetArgv() as *const *const libc::c_char);
            (0..argc as isize)
                .map(|i| {
                    let bytes = CStr::from_ptr(*argv.offset(i)).to_bytes().to_vec();
                    OsStringExt::from_vec(bytes)
                })
                .collect::<Vec<_>>()
        };
        Args { iter: vec.into_iter() }
    }

So, what's the difference between these two places? Which place actually does the real retrieval stuff?

like image 739
Jason Yu Avatar asked May 08 '21 09:05

Jason Yu


1 Answers

To reply directly to the question "Which place actually does the real retrieval stuff?", well, it depends on:

  • The target OS: Linux, MacOS, Windows, WebAssembly
  • The target "environment" (e.g. libc): glibc, musl, wasi, even miri in Rust's case

They basically are either passed as arguments to the program entry-point or provided "globally" by using functions/syscalls:

  • In the first case (passed as arguments), the Rust compiler generate code for initializing two static values ARGC and ARGV (located at std/src/sys/unix/args.rs#L87), which are then used by std::env::args() for the developer to use.

    Note that, depending on the libc used, this phase is done either at _start and/or by some ld+libc-specific routine (it gets messy when taking dynamic linking into account) In the case of glibc it's done by the GNU non-standard "init_array" extension (which is notably used for "cdylib" crates/.so executables): std/src/sys/unix/args.rs#L108-L128

    Also in case you directly specify the entry-point using the #[start] attribute you get direct access to the argc/argv values (compiler/rustc_codegen_ssa/src/base.rs#L447)

  • In the second case, no initialization code is needed and the args-getter functions are called by std::env::args() when needed, as you already noticed on MacOS

Such as MacOS (and Windows apparently) uses both methods, providing argc/argv both as arguments to _start and as getter functions callable from anywhere, which Rust uses.

Linux actually uses the first case only, although it wouldn't be surprising if the glibc provided some functions to get these values (by some wibbly wobbly magic methods), but the standard way is the first one.

For further reading, you can look at some links and articles about the "program loader" on Linux (sadly, there's not much on the subject in general, especially for other OSes):

  • LWN article "How programs get run: ELF binaries": https://lwn.net/Articles/631631/ (especially the "Populating the stack" part)
  • "The start attribute" section in one article of the "Rust OS dev" series: https://os.phil-opp.com/freestanding-rust-binary/#the-start-attribute
  • Reply to a (too broad, closed) Stack Overflow question about program loading and running: https://stackoverflow.com/a/32689330/1498917
like image 182
KokaKiwi Avatar answered Oct 20 '22 00:10

KokaKiwi