Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to convert a possibly-null-terminated ascii byte[] to a string?

Tags:

I need to convert a (possibly) null terminated array of ascii bytes to a string in C# and the fastest way I've found to do it is by using my UnsafeAsciiBytesToString method shown below. This method uses the String.String(sbyte*) constructor which contains a warning in it's remarks:

"The value parameter is assumed to point to an array representing a string encoded using the default ANSI code page (that is, the encoding method specified by Encoding.Default).

Note: * Because the default ANSI code page is system-dependent, the string created by this constructor from identical signed byte arrays may differ on different systems. * ...

* If the specified array is not null-terminated, the behavior of this constructor is system dependent. For example, such a situation might cause an access violation. * "

Now, I'm positive that the way the string is encoded will never change... but the default codepage on the system that my app is running on might change. So, is there any reason that I shouldn't run screaming from using String.String(sbyte*) for this purpose?

using System; using System.Text;  namespace FastAsciiBytesToString {     static class StringEx     {         public static string AsciiBytesToString(this byte[] buffer, int offset, int maxLength)         {             int maxIndex = offset + maxLength;              for( int i = offset; i < maxIndex; i++ )             {                 /// Skip non-nulls.                 if( buffer[i] != 0 ) continue;                 /// First null we find, return the string.                 return Encoding.ASCII.GetString(buffer, offset, i - offset);             }             /// Terminating null not found. Convert the entire section from offset to maxLength.             return Encoding.ASCII.GetString(buffer, offset, maxLength);         }          public static string UnsafeAsciiBytesToString(this byte[] buffer, int offset)         {             string result = null;              unsafe             {                 fixed( byte* pAscii = &buffer[offset] )                 {                      result = new String((sbyte*)pAscii);                 }             }              return result;         }     }      class Program     {         static void Main(string[] args)         {             byte[] asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c', 0, 0, 0 };              string result = asciiBytes.AsciiBytesToString(3, 6);              Console.WriteLine("AsciiBytesToString Result: \"{0}\"", result);              result = asciiBytes.UnsafeAsciiBytesToString(3);              Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);              /// Non-null terminated test.             asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c' };              result = asciiBytes.UnsafeAsciiBytesToString(3);              Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);              Console.ReadLine();         }     } } 
like image 897
Wayne Bloss Avatar asked Sep 27 '08 18:09

Wayne Bloss


People also ask

What is a null-terminated byte string?

A null-terminated byte string (NTBS) is a sequence of nonzero bytes followed by a byte with value zero (the terminating null character). Each byte in a byte string encodes one character of some character set.

Are strings null-terminated in C#?

There's no null-terminating character at the end of a C# string; therefore a C# string can contain any number of embedded null characters ('\0'). The Length property of a string represents the number of Char objects it contains, not the number of Unicode characters.

How many bytes is a null terminator?

However, in Modified UTF-8 the null character is encoded as two bytes: 0xC0, 0x80. This allows the byte with the value of zero, which is now not used for any character, to be used as a string terminator.

How is a null terminated string arranged in memory?

A null-terminated string is a sequence of ASCII characters, one to a byte, followed by a zero byte (a null byte). null-terminated strings are common in C and C++.


1 Answers

Oneliner (assuming the buffer actually contains ONE well formatted null terminated string):

String MyString = Encoding.ASCII.GetString(MyByteBuffer).TrimEnd((Char)0); 
like image 91
user3042599 Avatar answered Oct 15 '22 09:10

user3042599