I'm parsing a file (which I don't generate) that contains a string. The string is always preceded by 2 bytes which tell me the length of the string that follows.
For example:
05 00 53 70 6F 72 74
would be:
Sport
Using a C# BinaryReader, I read the string using:
string s = new string(binaryReader.ReadChars(size));
Sometimes there's the odd funky character which seems to push the position of the stream on further than it should. For example:
0D 00 63 6F 6F 6B 20 E2 80 94 20 62 6F 6F 6B
Should be:
cook - book
and although it reads fine the stream ends up two bytes further along than it should?! (Which then messes up the rest of the parsing.)
I'm guessing it has something to do with the 0xE2 in the middle, but I'm not really sure why or how to deal with it.
Any suggestions greatly appreciated!
My guess is that the string is encoded in UTF-8. The 3-byte sequence E2 80 94
corresponds to the single Unicode character U+2014 (EM DASH).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With