I created those two methods to convert Native utf-8 strings (char*) into managed string and vice versa. The following code does the job:
public IntPtr NativeUtf8FromString(string managedString)
{
byte[] buffer = Encoding.UTF8.GetBytes(managedString); // not null terminated
Array.Resize(ref buffer, buffer.Length + 1);
buffer[buffer.Length - 1] = 0; // terminating 0
IntPtr nativeUtf8 = Marshal.AllocHGlobal(buffer.Length);
Marshal.Copy(buffer, 0, nativeUtf8, buffer.Length);
return nativeUtf8;
}
string StringFromNativeUtf8(IntPtr nativeUtf8)
{
int size = 0;
byte[] buffer = {};
do
{
++size;
Array.Resize(ref buffer, size);
Marshal.Copy(nativeUtf8, buffer, 0, size);
} while (buffer[size - 1] != 0); // till 0 termination found
if (1 == size)
{
return ""; // empty string
}
Array.Resize(ref buffer, size - 1); // remove terminating 0
return Encoding.UTF8.GetString(buffer);
}
While NativeUtf8FromString is ok, StringFromNativeUtf8 is a mess but the only safe code I could get to run. Using unsafe code I could use an byte* but I do not want unsafe code. Is there another way someone can think of where I do not have to copy the string for every contained byte to find the 0 termination.
I just add the unsave code here:
public unsafe string StringFromNativeUtf8(IntPtr nativeUtf8)
{
byte* bytes = (byte*)nativeUtf8.ToPointer();
int size = 0;
while (bytes[size] != 0)
{
++size;
}
byte[] buffer = new byte[size];
Marshal.Copy((IntPtr)nativeUtf8, buffer, 0, size);
return Encoding.UTF8.GetString(buffer);
}
As you see its not ugly just needs unsafe.
UTF-8 is a Unicode encoding that represents each code point as a sequence of one to four bytes. Unlike the UTF-16 and UTF-32 encodings, the UTF-8 encoding does not require "endianness"; the encoding scheme is the same regardless of whether the processor is big-endian or little-endian.
In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.
We can use Encoding. GetString Method (Byte[]) to decodes all the bytes in the specified byte array into a string. Several other decoding schemes are also available in Encoding class such as UTF8, Unicode, UTF32, ASCII etc. The Encoding class is available as part of System.
Character and string processing in C# uses Unicode encoding. The char type represents a UTF-16 code unit, and the string type represents a sequence of UTF-16 code units. So far, so good. But that's C#.
All strings in a . NET Framework program are stored as 16-bit Unicode characters. At times you might need to convert from Unicode to some other character encoding, or from some other character encoding to Unicode.
Just perform the exact same operation strlen() performs. Do consider keeping the buffer around, the code does generate garbage in a hurry.
public static IntPtr NativeUtf8FromString(string managedString) {
int len = Encoding.UTF8.GetByteCount(managedString);
byte[] buffer = new byte[len + 1];
Encoding.UTF8.GetBytes(managedString, 0, managedString.Length, buffer, 0);
IntPtr nativeUtf8 = Marshal.AllocHGlobal(buffer.Length);
Marshal.Copy(buffer, 0, nativeUtf8, buffer.Length);
return nativeUtf8;
}
public static string StringFromNativeUtf8(IntPtr nativeUtf8) {
int len = 0;
while (Marshal.ReadByte(nativeUtf8, len) != 0) ++len;
byte[] buffer = new byte[len];
Marshal.Copy(nativeUtf8, buffer, 0, buffer.Length);
return Encoding.UTF8.GetString(buffer);
}
Slightly faster than Hans' solution (1 less buffer copy):
private unsafe IntPtr AllocConvertManagedStringToNativeUtf8(string input) {
fixed (char* pInput = input) {
var len = Encoding.UTF8.GetByteCount(pInput, input.Length);
var pResult = (byte*)Marshal.AllocHGlobal(len + 1).ToPointer();
var bytesWritten = Encoding.UTF8.GetBytes(pInput, input.Length, pResult, len);
Trace.Assert(len == bytesWritten);
pResult[len] = 0;
return (IntPtr)pResult;
}
}
private unsafe string MarshalNativeUtf8ToManagedString(IntPtr pStringUtf8)
=> MarshalNativeUtf8ToManagedString((byte*)pStringUtf8);
private unsafe string MarshalNativeUtf8ToManagedString(byte* pStringUtf8) {
var len = 0;
while (pStringUtf8[len] != 0) len++;
return Encoding.UTF8.GetString(pStringUtf8, len);
}
Here's I demo round-tripping a string:
var input = "Hello, World!";
var native = AllocConvertManagedStringToNativeUtf8(input);
var copy = MarshalNativeUtf8ToManagedString(native);
Marshal.FreeHGlobal(native); // don't leak unmanaged memory!
Trace.Assert(input == copy); // prove they're equal!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With