I know that the Rust application initialization entry is dynamically generated by rustc
. And I inspected the code at compiler/rustc_codegen_ssa/src/base.rs which the part of it is shown as below.
fn create_entry_fn<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
cx: &'a Bx::CodegenCx,
rust_main: Bx::Value,
rust_main_def_id: DefId,
use_start_lang_item: bool,
) -> Bx::Function {
// The entry function is either `int main(void)` or `int main(int argc, char **argv)`,
// depending on whether the target needs `argc` and `argv` to be passed in.
let llfty = if cx.sess().target.main_needs_argc_argv {
cx.type_func(&[cx.type_int(), cx.type_ptr_to(cx.type_i8p())], cx.type_int())
} else {
cx.type_func(&[], cx.type_int())
};
And what I found in the same file was really interesting as what I showed below, here from the comment, we can understand that Rust is collecting the input argc and argv at this place, and all these two parameters will be passed into the lang_start
function later if I understand correctly.
/// Obtain the `argc` and `argv` values to pass to the rust start function.
fn get_argc_argv<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
cx: &'a Bx::CodegenCx,
bx: &mut Bx,
) -> (Bx::Value, Bx::Value) {
if cx.sess().target.main_needs_argc_argv {
// Params from native `main()` used as args for rust start function
let param_argc = bx.get_param(0);
let param_argv = bx.get_param(1);
let arg_argc = bx.intcast(param_argc, cx.type_isize(), true);
let arg_argv = param_argv;
(arg_argc, arg_argv)
} else {
// The Rust start function doesn't need `argc` and `argv`, so just pass zeros.
let arg_argc = bx.const_int(cx.type_int(), 0);
let arg_argv = bx.const_null(cx.type_ptr_to(cx.type_i8p()));
(arg_argc, arg_argv)
}
}
But I also found another place where seems to do the same thing as what I've showed above at library/std/src/sys/unix/args.rs. For example, if you run a Rust app on Macos, seems Rust will use two FFI functions (_NSGetArgc / _NSGetArgv) to retrieve the argc and argv:
#[cfg(any(target_os = "macos", target_os = "ios"))]
mod imp {
use super::Args;
use crate::ffi::CStr;
pub unsafe fn init(_argc: isize, _argv: *const *const u8) {}
pub fn cleanup() {}
#[cfg(target_os = "macos")]
pub fn args() -> Args {
use crate::os::unix::prelude::*;
extern "C" {
// These functions are in crt_externs.h.
fn _NSGetArgc() -> *mut libc::c_int;
fn _NSGetArgv() -> *mut *mut *mut libc::c_char;
}
let vec = unsafe {
let (argc, argv) =
(*_NSGetArgc() as isize, *_NSGetArgv() as *const *const libc::c_char);
(0..argc as isize)
.map(|i| {
let bytes = CStr::from_ptr(*argv.offset(i)).to_bytes().to_vec();
OsStringExt::from_vec(bytes)
})
.collect::<Vec<_>>()
};
Args { iter: vec.into_iter() }
}
So, what's the difference between these two places? Which place actually does the real retrieval stuff?
To reply directly to the question "Which place actually does the real retrieval stuff?", well, it depends on:
They basically are either passed as arguments to the program entry-point or provided "globally" by using functions/syscalls:
In the first case (passed as arguments), the Rust compiler generate code for initializing two static values ARGC
and ARGV
(located at std/src/sys/unix/args.rs#L87), which are then used by std::env::args()
for the developer to use.
Note that, depending on the libc used, this phase is done either at _start
and/or by some ld+libc-specific routine (it gets messy when taking dynamic linking into account)
In the case of glibc it's done by the GNU non-standard "init_array" extension (which is notably used for "cdylib" crates/.so
executables): std/src/sys/unix/args.rs#L108-L128
Also in case you directly specify the entry-point using the #[start]
attribute you get direct access to the argc
/argv
values (compiler/rustc_codegen_ssa/src/base.rs#L447)
In the second case, no initialization code is needed and the args-getter functions are called by std::env::args()
when needed, as you already noticed on MacOS
Such as MacOS (and Windows apparently) uses both methods, providing argc
/argv
both as arguments to _start
and as getter functions callable from anywhere, which Rust uses.
Linux actually uses the first case only, although it wouldn't be surprising if the glibc provided some functions to get these values (by some wibbly wobbly magic methods), but the standard way is the first one.
For further reading, you can look at some links and articles about the "program loader" on Linux (sadly, there's not much on the subject in general, especially for other OSes):
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With