Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a null-terminated string

Tags:

c#

I am reading strings from a binary file. Each string is null-terminated. Encoding is UTF-8. In python I simply read a byte, check if it's 0, append it to a byte array, and continue reading bytes until I see a 0. Then I convert byte array into a string and move on. All of the strings were read correctly.

How can I read this in C#? I don't think I have the luxury of simply appending bytes to an array since the arrays are fixed size.

like image 261
MxLDevs Avatar asked Jul 29 '12 23:07

MxLDevs


People also ask

What is null-terminated character string?

In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article).

What is a null-terminated byte string?

A null-terminated byte string (NTBS) is a sequence of nonzero bytes followed by a byte with value zero (the terminating null character). Each byte in a byte string encodes one character of some character set.

Why does C use null-terminated strings?

Because in C strings are just a sequence of characters accessed viua a pointer to the first character. There is no space in a pointer to store the length so you need some indication of where the end of the string is. In C it was decided that this would be indicated by a null character.

Are UTF-8 strings null-terminated?

Yes, UTF-8 defines 0x0 as NUL .


1 Answers

Following should get you what you are looking for. All of text should be inside myText list.

var data = File.ReadAllBytes("myfile.bin");
List<string> myText = new List<string>();
int lastOffset = 0;
for (int i = 0; i < data.Length; i++)
{
    if (data[i] == 0)
    {
        myText.Add(System.Text.Encoding.UTF8.GetString(data, lastOffset, i - lastOffset));
        lastOffset = i + 1;
    }
}
like image 150
loopedcode Avatar answered Oct 19 '22 05:10

loopedcode