Conversion in .net: Native Utf-8 <-> Managed String

Tags:

I created those two methods to convert Native utf-8 strings (char*) into managed string and vice versa. The following code does the job:

public IntPtr NativeUtf8FromString(string managedString)
{
    byte[] buffer = Encoding.UTF8.GetBytes(managedString); // not null terminated
    Array.Resize(ref buffer, buffer.Length + 1);
    buffer[buffer.Length - 1] = 0; // terminating 0
    IntPtr nativeUtf8 = Marshal.AllocHGlobal(buffer.Length);
    Marshal.Copy(buffer, 0, nativeUtf8, buffer.Length);
    return nativeUtf8;
}

string StringFromNativeUtf8(IntPtr nativeUtf8)
{
    int size = 0;
    byte[] buffer = {};
    do
    {
        ++size;
        Array.Resize(ref buffer, size);
        Marshal.Copy(nativeUtf8, buffer, 0, size);
    } while (buffer[size - 1] != 0); // till 0 termination found

    if (1 == size)
    {
        return ""; // empty string
    }

    Array.Resize(ref buffer, size - 1); // remove terminating 0
    return Encoding.UTF8.GetString(buffer);
}

While NativeUtf8FromString is ok, StringFromNativeUtf8 is a mess but the only safe code I could get to run. Using unsafe code I could use an byte* but I do not want unsafe code. Is there another way someone can think of where I do not have to copy the string for every contained byte to find the 0 termination.

I just add the unsave code here:

public unsafe string StringFromNativeUtf8(IntPtr nativeUtf8)
{
    byte* bytes = (byte*)nativeUtf8.ToPointer();
    int size = 0;
    while (bytes[size] != 0)
    {
        ++size;
    }
    byte[] buffer = new byte[size];
    Marshal.Copy((IntPtr)nativeUtf8, buffer, 0, size);
    return Encoding.UTF8.GetString(buffer);
}

As you see its not ugly just needs unsafe.

656

asked May 27 '12 11:05

Totonga

2 Answers

Just perform the exact same operation strlen() performs. Do consider keeping the buffer around, the code does generate garbage in a hurry.

    public static IntPtr NativeUtf8FromString(string managedString) {
        int len = Encoding.UTF8.GetByteCount(managedString);
        byte[] buffer = new byte[len + 1];
        Encoding.UTF8.GetBytes(managedString, 0, managedString.Length, buffer, 0);
        IntPtr nativeUtf8 = Marshal.AllocHGlobal(buffer.Length);
        Marshal.Copy(buffer, 0, nativeUtf8, buffer.Length);
        return nativeUtf8;
    }

    public static string StringFromNativeUtf8(IntPtr nativeUtf8) {
        int len = 0;
        while (Marshal.ReadByte(nativeUtf8, len) != 0) ++len;
        byte[] buffer = new byte[len];
        Marshal.Copy(nativeUtf8, buffer, 0, buffer.Length);
        return Encoding.UTF8.GetString(buffer);
    }

answered Oct 05 '22 08:10

Hans Passant

Slightly faster than Hans' solution (1 less buffer copy):

private unsafe IntPtr AllocConvertManagedStringToNativeUtf8(string input) {
    fixed (char* pInput = input) {
        var len = Encoding.UTF8.GetByteCount(pInput, input.Length);
        var pResult = (byte*)Marshal.AllocHGlobal(len + 1).ToPointer();
        var bytesWritten = Encoding.UTF8.GetBytes(pInput, input.Length, pResult, len);
        Trace.Assert(len == bytesWritten);
        pResult[len] = 0;
        return (IntPtr)pResult;
    }
}

private unsafe string MarshalNativeUtf8ToManagedString(IntPtr pStringUtf8)
    => MarshalNativeUtf8ToManagedString((byte*)pStringUtf8);

private unsafe string MarshalNativeUtf8ToManagedString(byte* pStringUtf8) {
    var len = 0;
    while (pStringUtf8[len] != 0) len++;
    return Encoding.UTF8.GetString(pStringUtf8, len);
}

Here's I demo round-tripping a string:

var input = "Hello, World!";
var native = AllocConvertManagedStringToNativeUtf8(input);
var copy = MarshalNativeUtf8ToManagedString(native);
Marshal.FreeHGlobal(native); // don't leak unmanaged memory!
Trace.Assert(input == copy); // prove they're equal!

answered Oct 05 '22 08:10

Warty

Related questions
                            
                                How to set the cell width in itextsharp pdf creation
                            
                                debug web service proxy class in C#
                            
                                Inconsistent accessibility
                            
                                stringbuilder versus string concat
                            
                                Assembly 'SomeAssembly, uses 'System.Web.Mvc, Version=4.0.0.0, which has a higher version than referenced assembly 'System.Web.Mvc, Version 3.0.0.0
                            
                                Roman numerals to integers
                            
                                Why does this loop through Regex groups print the output twice?
                            
                                How to get all certificates with powershell?
                            
                                Entity Framework Core Auto Generated guid
                            
                                .net collection for fast insert/delete
                            
                                What's the most efficient way to determine whether an untrimmed string is empty in C#?
                            
                                C# - Making one Int64 from two Int32s
                            
                                Accessing C# property name or attributes
                            
                                Why does object.ToString() exist?
                            
                                IP address validation
                            
                                Call C# dll from Delphi
                            
                                C# Declare variable in lambda expression
                            
                                c# Public Nested Classes or Better Option?
                            
                                Display duration in milliseconds
                            
                                Model state not valid

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Conversion in .net: Native Utf-8 <-> Managed String

Tags:

string

c#

utf-8

marshalling

native

Totonga

People also ask

2 Answers

Hans Passant

Warty

Recent Activity

Donate For Us