Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Rust vector of tuples to a C compatible structure

Following these answers, I've currently defined a Rust 1.0 function as follows, in order to be callable from Python using ctypes:

use std::vec;

extern crate libc;
use libc::{c_int, c_float, size_t};
use std::slice;

#[no_mangle]
pub extern fn convert_vec(input_lon: *const c_float, 
                          lon_size: size_t, 
                          input_lat: *const c_float, 
                          lat_size: size_t) -> Vec<(i32, i32)> {
    let input_lon = unsafe {
        slice::from_raw_parts(input_lon, lon_size as usize)
    };
    let input_lat = unsafe {
        slice::from_raw_parts(input_lat, lat_size as usize)
    };

    let combined: Vec<(i32, i32)> = input_lon
        .iter()
        .zip(input_lat.iter())
        .map(|each| convert(*each.0, *each.1))
        .collect();
    return combined
}

And I'm setting up the Python part like so:

from ctypes import *

class Int32_2(Structure):
    _fields_ = [("array", c_int32 * 2)]

rust_bng_vec = lib.convert_vec_py
rust_bng_vec.argtypes = [POINTER(c_float), c_size_t, 
                         POINTER(c_float), c_size_t]
rust_bng_vec.restype = POINTER(Int32_2)

This seems to be OK, but I'm:

  • Not sure how to transform combined (a Vec<(i32, i32)>) to a C-compatible structure, so it can be returned to my Python script.
  • Not sure whether I should be returning a reference (return &combined?) and how I would have to annotate the function with the appropriate lifetime specifier if I did
like image 410
urschrei Avatar asked Jun 22 '15 16:06

urschrei


1 Answers

The most important thing to note is that there is no such thing as a tuple in C. C is the lingua franca of library interoperability, and you will be required to restrict yourself to abilities of this language. It doesn't matter if you are talking between Rust and another high-level language; you have to speak C.

There may not be tuples in C, but there are structs. A two-element tuple is just a struct with two members!

Let's start with the C code that we would write:

#include <stdio.h>
#include <stdint.h>

typedef struct {
  uint32_t a;
  uint32_t b;
} tuple_t;

typedef struct {
  void *data;
  size_t len;
} array_t;

extern array_t convert_vec(array_t lat, array_t lon);

int main() {
  uint32_t lats[3] = {0, 1, 2};
  uint32_t lons[3] = {9, 8, 7};

  array_t lat = { .data = lats, .len = 3 };
  array_t lon = { .data = lons, .len = 3 };

  array_t fixed = convert_vec(lat, lon);
  tuple_t *real = fixed.data;

  for (int i = 0; i < fixed.len; i++) {
    printf("%d, %d\n", real[i].a, real[i].b);
  }

  return 0;
}

We've defined two structs — one to represent our tuple, and another to represent an array, as we will be passing those back and forth a bit.

We will follow this up by defining the exact same structs in Rust and define them to have the exact same members (types, ordering, names). Importantly, we use #[repr(C)] to let the Rust compiler know to not do anything funky with reordering the data.

extern crate libc;

use std::slice;
use std::mem;

#[repr(C)]
pub struct Tuple {
    a: libc::uint32_t,
    b: libc::uint32_t,
}

#[repr(C)]
pub struct Array {
    data: *const libc::c_void,
    len: libc::size_t,
}

impl Array {
    unsafe fn as_u32_slice(&self) -> &[u32] {
        assert!(!self.data.is_null());
        slice::from_raw_parts(self.data as *const u32, self.len as usize)
    }

    fn from_vec<T>(mut vec: Vec<T>) -> Array {
        // Important to make length and capacity match
        // A better solution is to track both length and capacity
        vec.shrink_to_fit();

        let array = Array { data: vec.as_ptr() as *const libc::c_void, len: vec.len() as libc::size_t };

        // Whee! Leak the memory, and now the raw pointer (and
        // eventually C) is the owner.
        mem::forget(vec);

        array
    }
}

#[no_mangle]
pub extern fn convert_vec(lon: Array, lat: Array) -> Array {
    let lon = unsafe { lon.as_u32_slice() };
    let lat = unsafe { lat.as_u32_slice() };

    let vec =
        lat.iter().zip(lon.iter())
        .map(|(&lat, &lon)| Tuple { a: lat, b: lon })
        .collect();

    Array::from_vec(vec)
}

We must never accept or return non-repr(C) types across the FFI boundary, so we pass across our Array. Note that there's a good amount of unsafe code, as we have to convert an unknown pointer to data (c_void) to a specific type. That's the price of being generic in C world.

Let's turn our eye to Python now. Basically, we just have to mimic what the C code did:

import ctypes

class FFITuple(ctypes.Structure):
    _fields_ = [("a", ctypes.c_uint32),
                ("b", ctypes.c_uint32)]

class FFIArray(ctypes.Structure):
    _fields_ = [("data", ctypes.c_void_p),
                ("len", ctypes.c_size_t)]

    # Allow implicit conversions from a sequence of 32-bit unsigned
    # integers.
    @classmethod
    def from_param(cls, seq):
        return cls(seq)

    # Wrap sequence of values. You can specify another type besides a
    # 32-bit unsigned integer.
    def __init__(self, seq, data_type = ctypes.c_uint32):
        array_type = data_type * len(seq)
        raw_seq = array_type(*seq)
        self.data = ctypes.cast(raw_seq, ctypes.c_void_p)
        self.len = len(seq)

# A conversion function that cleans up the result value to make it
# nicer to consume.
def void_array_to_tuple_list(array, _func, _args):
    tuple_array = ctypes.cast(array.data, ctypes.POINTER(FFITuple))
    return [tuple_array[i] for i in range(0, array.len)]

lib = ctypes.cdll.LoadLibrary("./target/debug/libtupleffi.dylib")

lib.convert_vec.argtypes = (FFIArray, FFIArray)
lib.convert_vec.restype = FFIArray
lib.convert_vec.errcheck = void_array_to_tuple_list

for tupl in lib.convert_vec([1,2,3], [9,8,7]):
    print tupl.a, tupl.b

Forgive my rudimentary Python. I'm sure an experienced Pythonista could make this look a lot prettier! Thanks to @eryksun for some nice advice on how to make the consumer side of calling the method much nicer.

A word about ownership and memory leaks

In this example code, we've leaked the memory allocated by the Vec. Theoretically, the FFI code now owns the memory, but realistically, it can't do anything useful with it. To have a fully correct example, you'd need to add another method that would accept the pointer back from the callee, transform it back into a Vec, then allow Rust to drop the value. This is the only safe way, as Rust is almost guaranteed to use a different memory allocator than the one your FFI language is using.

Not sure whether I should be returning a reference and how I would have to annotate the function with the appropriate lifetime specifier if I did

No, you don't want to (read: can't) return a reference. If you could, then the ownership of the item would end with the function call, and the reference would point to nothing. This is why we need to do the two-step dance with mem::forget and returning a raw pointer.

like image 173
Shepmaster Avatar answered Nov 15 '22 10:11

Shepmaster