What's the most direct way to use a C string as Rust's Path
?
I've got const char *
from FFI and need to use it as a filesystem path in Rust.
str
/String
is undesirable.To clarify: I'm just replacing an existing C implementation that passes the path to fopen
with a Rust stdlib implementation. It's not my problem whether it's a valid path or encoded properly for a given filesystem, as long as it's not worse than fopen
(and I know fopen
basically doesn't work on Windows).
Here's what I've learned:
Path
/OsStr
always use WTF-8 on Windows, and are an encoding-ignorant bag of bytes on Unix.
They never ever store any paths using any "wide" encoding like UTF-16 or UCS-2. The Windows-only masquerade of OsStr
is to hide the WTF-8 encoding, nothing more.
It is extremely unlikely to ever change, because the standard library API supports creation of Path
and OsStr
from UTF-8 &str
without any allocation or mutation of memory (i.e. as_ref()
is supported, and its strict API doesn't leave room to implement it as anything other than a pointer cast).
Unix-only zero-copy version (it doesn't even depend on any implementation details):
use std::ffi::{CStr,OsStr};
use std::path::Path;
use std::os::unix::ffi::OsStrExt;
let slice = CStr::from_ptr(c_null_terminated_string_ptr_here);
let osstr = OsStr::from_bytes(slice.to_bytes());
let path: &Path = osstr.as_ref();
On Windows, converting only valid UTF-8 is the best Rust can do without a charade of creating WTF-8 OsString
from code units:
…
let str = ::std::str::from_utf8(slice.to_bytes()).expect("keep your surrogates paired");
let path: &Path = str.as_ref();
Safely and portably? Insofar as I'm aware, there isn't a way. My advice is to demand UTF-8 and just pray it never breaks.
The problem is that the only thing you can really say about a "C string" is that it's NUL-terminated. You can't really say anything meaningful about how it's encoded. At least, not with any real certainty.
Unsafely and/or non-portably? If you're running on Linux (and possibly other modern *NIXen), you can maybe use OsStrExt
to do the conversion. This only works assuming the C string was a valid path in the first place. If it came from some string processing code that wasn't using the same encoding as the filesystem (which these days is generally "arbitrary bytes that look like UTF-8 but might not be")... well, you'll have to convert it yourself, first.
On Windows? Hahahaha. This depends on where the string came from. C strings embedded in an executable can be in a variety of encodings depending on how the code was compiled. If it came from the OS itself, it could be in one of two different encodings: the thread's OEM codepage, or the thread's ANSI codepage. I never worked out how to check which it's set to. If it came from the console, it would be in whatever the console's input encoding was set to when you received it... assuming it wasn't piped in from something else that was using a different encoding (hi there, PowerShell!). All of the above require you to roll your own transcoding code, since Rust itself avoids this by never, ever using non-Unicode APIs on Windows.
Oh, and don't forget that there is no 8-bit encoding that can properly store Windows paths, since Windows paths are "arbitrary 16-bit words that look like UTF-16 but might not be". [1]
... so, like I said: demand UTF-8 and just pray it never breaks, because trying to do it "correctly" leads to madness.
[1]: I should clarify: there is such an encoding: WTF-8, which is what Rust uses for OsStr
and OsString
on Windows. The catch is that nothing else on Windows uses this, so it's never going to be how a C string is encoded.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With