Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make SWIG deal with utf8 strings in C#?

Tags:

c#

utf-8

swig

I'm writing a portable C++ library with bindings to other languages (java, C#, python). I'm making those bindings with help of SWIG.

I have a class written in C++:

class MyClass
{
public:
    const char* get_value() const;              // returns utf8-string
    void        set_value(const char* value);   // gets utf8-string
private:
    // ...
};

And I have something like that on C# side:

public class MyClass
{
    public string get_value();
    public void   set_value(string value);
}

SWIG does everything well, except that it doesn't make an utf8 <=> utf16 string conversion during the calls to MyClass.

What can I do with that? Writing a custom typemaps looks a bit complicated, and I need help with that if it is the only available solution.

like image 304
Aleksei Avatar asked May 25 '13 13:05

Aleksei


1 Answers

I was able to answer my own very similar question as follows. I think (although I haven't tested) that this might work for your situation too with no changes. The only difference is that I'm using std::string, and you're using char*, but I think SWIG already treats these the same way.

With the help (read: genius!) of David Jeske in the linked Code Project article, I have finally been able to answer this question.

You'll need this class (from David Jeske's code) in your C# library.

public class UTF8Marshaler : ICustomMarshaler {
    static UTF8Marshaler static_instance;

    public IntPtr MarshalManagedToNative(object managedObj) {
        if (managedObj == null)
            return IntPtr.Zero;
        if (!(managedObj is string))
            throw new MarshalDirectiveException(
                   "UTF8Marshaler must be used on a string.");

        // not null terminated
        byte[] strbuf = Encoding.UTF8.GetBytes((string)managedObj); 
        IntPtr buffer = Marshal.AllocHGlobal(strbuf.Length + 1);
        Marshal.Copy(strbuf, 0, buffer, strbuf.Length);

        // write the terminating null
        Marshal.WriteByte(buffer + strbuf.Length, 0); 
        return buffer;
    }

    public unsafe object MarshalNativeToManaged(IntPtr pNativeData) {
        byte* walk = (byte*)pNativeData;

        // find the end of the string
        while (*walk != 0) {
            walk++;
        }
        int length = (int)(walk - (byte*)pNativeData);

        // should not be null terminated
        byte[] strbuf = new byte[length];  
        // skip the trailing null
        Marshal.Copy((IntPtr)pNativeData, strbuf, 0, length); 
        string data = Encoding.UTF8.GetString(strbuf);
        return data;
    }

    public void CleanUpNativeData(IntPtr pNativeData) {
        Marshal.FreeHGlobal(pNativeData);            
    }

    public void CleanUpManagedData(object managedObj) {
    }

    public int GetNativeDataSize() {
        return -1;
    }

    public static ICustomMarshaler GetInstance(string cookie) {
        if (static_instance == null) {
            return static_instance = new UTF8Marshaler();
        }
        return static_instance;
    }
}

Then, in Swig's "std_string.i", on line 24 replace this line:

%typemap(imtype) string "string"

with this line:

%typemap(imtype, inattributes="[MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(UTF8Marshaler))]", outattributes="[return: MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(UTF8Marshaler))]") string "string"

and on line 61, replace this line:

%typemap(imtype) const string & "string"

with this line:

%typemap(imtype, inattributes="[MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(UTF8Marshaler))]", outattributes="[return: MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(UTF8Marshaler))]") string & "string"

Lo and behold, everything works. Read the linked article for a good understanding of how this works.

like image 153
Boinst Avatar answered Sep 24 '22 02:09

Boinst