Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the proper way to go from a String to a *const i8?

Tags:

rust

ffi

In my ongoing saga of writing a safe wrapper for the Cassandra C++ driver, my eye now turns towards avoiding memory leaks when calling C functions with signatures like:

cass_string_init2(const char* data, cass_size_t length);

or

cass_string_init(const char* null_terminated);

I have tried a few different approaches that nominally work, and produce a correct result, but I haven't found a way to manage the lifetime of this data properly. Two example approaches are below.

pub fn str_to_ref(mystr:&str) -> *const i8 {unsafe{
    let cstr = CString::from_slice(mystr.as_bytes());
    cstr.as_slice().as_ptr()
}}

and

pub fn str_to_ref(mystr: &str) -> *const i8 {
    let l = mystr.as_bytes();
    unsafe {
        let b = alloc::heap::allocate(mystr.len()+1, 8);
        let s = slice::from_raw_parts_mut(b, mystr.len()+1);
        slice::bytes::copy_memory(s, l);
        s[mystr.len()] = 0;
        return b as *const i8;
    }
}

The first does invalid memory accesses like

==26355==  Address 0x782d140 is 0 bytes inside a block of size 320 free'd
==26355==    at 0x1361A8: je_valgrind_freelike_block (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
==26355==    by 0x11272D: heap::imp::deallocate::h7b540039fbffea4dPha (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
==26355==    by 0x112679: heap::deallocate::h3897fed87b942253tba (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
==26355==    by 0x112627: vec::dealloc::h7978768019700822177 (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
==26355==    by 0x112074: vec::Vec$LT$T$GT$.Drop::drop::h239007174869221309 (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
==26355==    by 0x111F9D: collections..vec..Vec$LT$i8$GT$::glue_drop.5732::h978a83960ecb86a4 (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
==26355==    by 0x111F6D: std..ffi..c_str..CString::glue_drop.5729::h953a595760f34a9d (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
==26355==    by 0x112903: cql_ffi::helpers::str_to_ref::hef3994fa55168b90bqd (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
=

while the second doesn't know when to deallocate its memory, resulting in:

==29782== 8 bytes in 1 blocks are definitely lost in loss record 1 of 115
==29782==    at 0x12A5B2: je_mallocx (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
==29782==    by 0x1142D5: heap::imp::allocate::h3fa8a1c097e9ea53Tfa (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
==29782==    by 0x114221: heap::allocate::h18d191ce51ab2236gaa (in /home/tupshin/workspaces/rust/cql-ffi/target/basic)
==29782==    by 0x112874: cql_ffi::helpers::str_to_ref::h5b60f207d1e31841bqd (helpers.rs:25)

Using either of those two approaches as a starting point, or something completely different, I would really appreciate some guidance on a proper way to accomplish this.

Edit:

Shep's answer perfectly solved my issues using cass_string_init and cass_string_init2. Thank you so much. However, I'm still not clear on passing *const i8 params to other functions such as:

CASS_EXPORT CassError
cass_cluster_set_contact_points(CassCluster* cluster,
const char* contact_points);

which expect to be passed a reference to a null-terminated string.

Based on the previous approach that worked for CassStrings, along with the CString docs, I came up with the following:

pub struct ContactPoints(*const c_char);

pub trait AsContactPoints {
    fn as_contact_points(&self) -> ContactPoints;
}

impl AsContactPoints for str {
    fn as_contact_points(&self) -> ContactPoints {
        let cstr = CString::new(self).unwrap();
        let bytes = cstr.as_bytes_with_nul();
        let ptr = bytes.as_ptr();
        ContactPoints(ptr as *const i8)
    }
}

(the excessive let bindings there are just to make sure I wasn't missing any subtlety)

and that runs correctly, but valgrind complains:

==22043== Invalid read of size 1
==22043==    at 0x4C2E0E2: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22043==    by 0x4F8AED8: cass_cluster_set_contact_points (in /usr/local/lib/libcassandra.so.1.0.0)
==22043==    by 0x11367A: cql_ffi::cluster::CassCluster::set_contact_points::h575496cbf7644b9e6oa (cluster.rs:76)
like image 983
Tupshin Harper Avatar asked Feb 21 '15 18:02

Tupshin Harper


1 Answers

The Cassandra C API for cass_string_init2 looks like:

Note: This does not allocate memory. The object wraps the pointer passed into this function.

CASS_EXPORT CassString
cass_string_init2(const char* data, cass_size_t length);

That is, it takes a string, and returns an alternate representation of a string. That representation looks like:

typedef struct CassString_ {
    const char* data;
    cass_size_t length;
} CassString;

This is where you want to use #[repr(C)] in the Rust code:

#[repr(C)]
struct CassStr {
    data: *const c_char,
    length: size_t,
}

The nicest thing you could do is make strings automatically convert to this struct:

trait AsCassStr {
    fn as_cass_str(&self) -> CassStr;
}

impl AsCassStr for str {
    fn as_cass_str(&self) -> CassStr {
        CassStr {
            data: self.as_bytes(),
            length: self.len(),
        }
    }
}

And then have your API accept anything that implements AsCassStr. This allows you to have owned variants as well. You may also want to look into PhantomData to allow enforcing lifetimes of the CassStr object.

Note Normally you want to use CString to avoid strings with interior NUL bytes. However, since the API accepts a length parameter, it's possible that it natively supports them. You'll need to experiment to find out. If not, then you'll need to use CString as shown below.

Second half of question

Lets take a look at your function, line-by-line:

impl AsContactPoints for str {
    fn as_contact_points(&self) -> ContactPoints {
        let cstr = CString::new(self).unwrap(); // 1
        let bytes = cstr.as_bytes_with_nul();   // 2
        let ptr = bytes.as_ptr();               // 3
        ContactPoints(ptr as *const i8)         // 4
    }                                           // 5
}
  1. We create a new CString. This allocates a bit of memory somewhere, verifies that the string has no internal NUL bytes, then copies our string in, byte-for-byte, and adds a trailing zero.
  2. We get a slice that refers to the bytes that we have copied and verified in step 1. Recall that slices are a pointer to data plus a length.
  3. We convert the slice to a pointer, ignoring the length.
  4. We store the pointer in a structure, using that as our return value
  5. The function exits, and all local variables are freed. Note that cstr is a local variable, and so the bytes it is holding are likewise freed. You now have a dangling pointer. Not good!

You need to ensure that the CString lives for as long as it needs to. Hopefully the function you are calling doesn't keep a reference to it, but there's no easy way to tell from the function signature. You probably want code that looks like:

fn my_cass_call(s: &str) {
    let s = CString::new(s).unwrap();
    cass_call(s.as_ptr()) // `s` is still alive here
}

The benefit here is that you never store the pointer in a variable of your own. Remember that raw pointers do not have lifetimes, so you have to be very careful with them!

like image 103
Shepmaster Avatar answered Nov 10 '22 12:11

Shepmaster