Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert a C string into a Rust string and back via FFI?

Tags:

c

rust

ffi

I'm trying to get a C string returned by a C library and convert it to a Rust string via FFI.

mylib.c

const char* hello(){     return "Hello World!"; } 

main.rs

#![feature(link_args)]  extern crate libc; use libc::c_char;  #[link_args = "-L . -I . -lmylib"] extern {     fn hello() -> *c_char; }  fn main() {     //how do I get a str representation of hello() here? } 
like image 239
Dirk Avatar asked Jun 10 '14 16:06

Dirk


People also ask

How do you convert string to Rust?

To convert any type to a String is as simple as implementing the ToString trait for the type. Rather than doing so directly, you should implement the fmt::Display trait which automagically provides ToString and also allows printing the type as discussed in the section on print! .

How do you convert string to bytes in Rust?

Here are a few different conversions between types. &str to &[u8] : let my_string: &str = "some string"; let my_bytes: &[u8] = my_string. as_bytes();

What is the difference between string and &str in Rust?

In easy words, String is datatype stored on heap (just like Vec ), and you have access to that location. &str is a slice type. That means it is just reference to an already present String somewhere in the heap. &str doesn't do any allocation at runtime.

Are Rust strings null terminated?

Most C APIs require that the string being passed to them is null-terminated, and by default rust's string types are not null terminated. The other problem with translating Rust strings to C strings is that Rust strings can validly contain a null-byte in the middle of the string (0 is a valid Unicode codepoint).


2 Answers

The best way to work with C strings in Rust is to use structures from the std::ffi module, namely CStr and CString.

CStr is a dynamically sized type and so it can only be used through a pointer. This makes it very similar to the regular str type. You can construct a &CStr from *const c_char using an unsafe CStr::from_ptr static method. This method is unsafe because there is no guarantee that the raw pointer you pass to it is valid, that it really does point to a valid C string and that the string's lifetime is correct.

You can get a &str from a &CStr using its to_str() method.

Here is an example:

extern crate libc;  use libc::c_char; use std::ffi::CStr; use std::str;  extern {     fn hello() -> *const c_char; }  fn main() {     let c_buf: *const c_char = unsafe { hello() };     let c_str: &CStr = unsafe { CStr::from_ptr(c_buf) };     let str_slice: &str = c_str.to_str().unwrap();     let str_buf: String = str_slice.to_owned();  // if necessary } 

You need to take into account the lifetime of your *const c_char pointers and who owns them. Depending on the C API, you may need to call a special deallocation function on the string. You need to carefully arrange conversions so the slices won't outlive the pointer. The fact that CStr::from_ptr returns a &CStr with arbitrary lifetime helps here (though it is dangerous by itself); for example, you can encapsulate your C string into a structure and provide a Deref conversion so you can use your struct as if it was a string slice:

extern crate libc;  use libc::c_char; use std::ops::Deref; use std::ffi::CStr;  extern "C" {     fn hello() -> *const c_char;     fn goodbye(s: *const c_char); }  struct Greeting {     message: *const c_char, }  impl Drop for Greeting {     fn drop(&mut self) {         unsafe {             goodbye(self.message);         }     } }  impl Greeting {     fn new() -> Greeting {         Greeting { message: unsafe { hello() } }     } }  impl Deref for Greeting {     type Target = str;      fn deref<'a>(&'a self) -> &'a str {         let c_str = unsafe { CStr::from_ptr(self.message) };         c_str.to_str().unwrap()     } } 

There is also another type in this module called CString. It has the same relationship with CStr as String with str - CString is an owned version of CStr. This means that it "holds" the handle to the allocation of the byte data, and dropping CString would free the memory it provides (essentially, CString wraps Vec<u8>, and it's the latter that will be dropped). Consequently, it is useful when you want to expose the data allocated in Rust as a C string.

Unfortunately, C strings always end with the zero byte and can't contain one inside them, while Rust &[u8]/Vec<u8> are exactly the opposite thing - they do not end with zero byte and can contain arbitrary numbers of them inside. This means that going from Vec<u8> to CString is neither error-free nor allocation-free - the CString constructor both checks for zeros inside the data you provide, returning an error if it finds some, and appends a zero byte to the end of the byte vector which may require its reallocation.

Like String, which implements Deref<Target = str>, CString implements Deref<Target = CStr>, so you can call methods defined on CStr directly on CString. This is important because the as_ptr() method that returns the *const c_char necessary for C interoperation is defined on CStr. You can call this method directly on CString values, which is convenient.

CString can be created from everything which can be converted to Vec<u8>. String, &str, Vec<u8> and &[u8] are valid arguments for the constructor function, CString::new(). Naturally, if you pass a byte slice or a string slice, a new allocation will be created, while Vec<u8> or String will be consumed.

extern crate libc;  use libc::c_char; use std::ffi::CString;  fn main() {     let c_str_1 = CString::new("hello").unwrap(); // from a &str, creates a new allocation     let c_str_2 = CString::new(b"world" as &[u8]).unwrap(); // from a &[u8], creates a new allocation     let data: Vec<u8> = b"12345678".to_vec(); // from a Vec<u8>, consumes it     let c_str_3 = CString::new(data).unwrap();      // and now you can obtain a pointer to a valid zero-terminated string     // make sure you don't use it after c_str_2 is dropped     let c_ptr: *const c_char = c_str_2.as_ptr();      // the following will print an error message because the source data     // contains zero bytes     let data: Vec<u8> = vec![1, 2, 3, 0, 4, 5, 0, 6];     match CString::new(data) {         Ok(c_str_4) => println!("Got a C string: {:p}", c_str_4.as_ptr()),         Err(e) => println!("Error getting a C string: {}", e),     }   } 

If you need to transfer ownership of the CString to C code, you can call CString::into_raw. You are then required to get the pointer back and free it in Rust; the Rust allocator is unlikely to be the same as the allocator used by malloc and free. All you need to do is call CString::from_raw and then allow the string to be dropped normally.

like image 113
Vladimir Matveev Avatar answered Sep 28 '22 05:09

Vladimir Matveev


In addition to what @vladimir-matveev has said, you can also convert between them without the aid of CStr or CString:

#![feature(link_args)]  extern crate libc; use libc::{c_char, puts, strlen}; use std::{slice, str};  #[link_args = "-L . -I . -lmylib"] extern "C" {     fn hello() -> *const c_char; }  fn main() {     //converting a C string into a Rust string:     let s = unsafe {         let c_s = hello();         str::from_utf8_unchecked(slice::from_raw_parts(c_s as *const u8, strlen(c_s)+1))     };     println!("s == {:?}", s);     //and back:     unsafe {         puts(s.as_ptr() as *const c_char);     } } 

Just make sure that when converting from a &str to a C string, your &str ends with '\0'. Notice that in the code above I use strlen(c_s)+1 instead of strlen(c_s), so s is "Hello World!\0", not just "Hello World!".
(Of course in this particular case it works even with just strlen(c_s). But with a fresh &str you couldn't guarantee that the resulting C string would terminate where expected.)
Here's the result of running the code:

s == "Hello World!\u{0}" Hello World! 
like image 42
Des Nerger Avatar answered Sep 28 '22 05:09

Des Nerger