Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting a UTF-8 encoded String from a Rust DLL in C#

Tags:

I found a lot of information on US-ANSI strings for a Rust DLL implementation in C#, but this does not solve any issues for UTF-8 encoded strings.

For example, "Brötchen", once called in C#, results in "Brötchen".

Rust

use std::os::raw::c_char;
use std::ffi::CString;

#[no_mangle]
pub extern fn string_test() -> *mut c_char {
    let c_to_print = CString::new("Brötchen")
        .expect("CString::new failed!");
    let r = c_to_print;
    r.into_raw()  
}

C#

[DllImport(@"C:\Users\User\source\repos\testlib\target\debug\testlib.dll")]
private static extern IntPtr string_test();

public static void run()
{
    var s = string_test();
    var res = Marshal.PtrToStringAnsi(s);
    // var res = Marshal.PtrToStringUni(s);
    // var res = Marshal.PtrToStringAuto(s);
    // Are resulting in: ????n
    Console.WriteLine(res); // prints "Brötchen", expected "Brötchen"
}

How do I get the desired result?

I do not think this is a duplicate of How can I transform string to UTF-8 in C#? because its answers resulting in the same manner as Marshal.PtrToStringAuto(s) and Marshal.PtrToStringUni(s).

like image 648
valerius21 Avatar asked Feb 15 '19 10:02

valerius21


People also ask

Are Rust strings UTF-8?

The String type is provided in Rust's standard library rather than coded into the core language and is a growable, mutable, owned, UTF-8 encoded string type.

Are Rust strings Unicode?

Rust's character and string types are designed around Unicode. String is not a sequence of ASCII chars, instead, it is a sequence of Unicode characters. A Rust char type is a 32-bit value holding a Unicode code.

How do I encode strings to UTF-8?

In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.

How do you store string in Rust?

A String is stored as a vector of bytes ( Vec<u8> ), but guaranteed to always be a valid UTF-8 sequence. String is heap allocated, growable and not null terminated. &str is a slice ( &[u8] ) that always points to a valid UTF-8 sequence, and can be used to view into a String , just like &[T] is a view into Vec<T> .


2 Answers

The answer lies in using Marshal.PtrToStringUTF8, simplifying,

use std::ffi::CString;

#[no_mangle]
pub extern "C" fn string_test() -> *mut c_char {
    let s = CString::new("Brötchen").expect("CString::new failed!");
    s.into_raw()
}

Then C#

[DllImport(RUSTLIB)] static extern IntPtr string_test();
//...        
var encodeText = string_test();
var text = Marshal.PtrToStringUTF8(encodeText);

Console.WriteLine("Decode String : {0}", text);
like image 115
Sith2021 Avatar answered Sep 20 '22 22:09

Sith2021


Thanks to @E_net4's comment recommending to read the Rust FFI Omnibus, I came to an answer that is rather complicated but works.

I figured that I have to rewrite the classes I am using. Furthermore, I am using the libc library and CString.

Cargo.toml

[package]
name = "testlib"
version = "0.1.0"
authors = ["John Doe <[email protected]>"]
edition = "2018"

[lib]
crate-type = ["cdylib"]

[dependencies]
libc = "0.2.48"

src/lib.rs

extern crate libc;

use libc::{c_char, uint32_t};
use std::ffi::{CStr, CString};
use std::str;

// Takes foreign C# string as input, converts it to Rust String
fn mkstr(s: *const c_char) -> String {
    let c_str = unsafe {
        assert!(!s.is_null());

        CStr::from_ptr(s)
    };

    let r_str = c_str.to_str()
        .expect("Could not successfully convert string form foreign code!");

    String::from(r_str)
}


// frees string from ram, takes string pointer as input
#[no_mangle]
pub extern fn free_string(s: *mut c_char) {
    unsafe {
        if s.is_null() { return }
        CString::from_raw(s)
    };
}

// method, that takes the foreign C# string as input, 
// converts it to a rust string, and returns it as a raw CString.
#[no_mangle]
pub extern fn result(istr: *const c_char) -> *mut c_char {
    let s = mkstr(istr);
    let cex = CString::new(s)
        .expect("Failed to create CString!");

    cex.into_raw()
}

C# Class

using System;
using System.Text;
using System.Runtime.InteropServices;


namespace Testclass
{
    internal class Native
    {
        [DllImport("testlib.dll")]
        internal static extern void free_string(IntPtr str);

        [DllImport("testlib.dll")]
        internal static extern StringHandle result(string inputstr);
    }

    internal class StringHandle : SafeHandle
    {
        public StringHandle() : base(IntPtr.Zero, true) { }

        public override bool IsInvalid
        {
            get { return false; }
        }

        public string AsString()
        {
            int len = 0;
            while (Marshal.ReadByte(handle,len) != 0) { ++len; }
            byte[] buffer = new byte[len];
            Marshal.Copy(handle, buffer, 0, buffer.Length);
            return Encoding.UTF8.GetString(buffer);
        }

        protected override bool ReleaseHandle()
        {
            Native.free_string(handle);
            return true;
        }
    }

    internal class StringTesting: IDisposable
    {
        private StringHandle str;
        private string resString;
        public StringTesting(string word)
        {
            str = Native.result(word);
        }
        public override string ToString()
        {
            if (resString == null)
            {
                resString = str.AsString();
            }
            return resString;
        }
        public void Dispose()
        {
            str.Dispose();
        }
    }

    class Testclass
    {
        public static string Testclass(string inputstr)
        {
            return new StringTesting(inputstr).ToString();
        }

        public static Main()
        {
            Console.WriteLine(new Testclass("Brötchen")); // output: Brötchen 
        }
    }
}

While this archives the desired result, I am still unsure what causes the wrong decoding in the code provided by the question.

like image 35
valerius21 Avatar answered Sep 17 '22 22:09

valerius21