Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# utf8-encoding bytearray out of range

I have the following problem: if the String contains a char that is not known from ASCII, it uses a 63.

Because of that i changed the encoding to UTF8, but I know a char can have the length of two bytes, so I get a out of range error.
How can I solve the problem?

System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();

byte[] baInput = enc.GetBytes(strInput);

// Split byte array (6 Byte) in date (days) and time (ms) parts
byte[] baMsec = new byte[4];
byte[] baDays = new byte[2];

for (int i = 0; i < baInput.Length; i++)
{
    if (4 > i)
    {
        baMsec[i] = baInput[i];
    }
    else
    {
        baDays[i - 4] = baInput[i];
    }
}
like image 530
xproseal Avatar asked Jun 27 '16 06:06

xproseal


2 Answers

The problem you seem to be having is that you know the number of characters, but not the number of bytes, when using UTF8. To solve just that problem, you could use:

byte[] baMsec = Encoding.UTF8.GetBytes(strInput.SubString(0, 4));
byte[] baDays = Encoding.UTF8.GetBytes(strInput.SubString(4));
like image 51
C.Evenhuis Avatar answered Nov 24 '22 11:11

C.Evenhuis


Recommended Solution:

1) Split the strInput using the SubString(Int32, Int32) method and get the date and time parts in separate String variables, say strDate and strTime.

2) Then call UTF8Encoding.GetBytes on strDate and strTime and collect the byte array in baDays and baMsec respectively.

Why this works:

C# String is by default UTF-16 encoded, which is equally good to represent non-ASCII characters. Hence, no data is lost.

General Caution:

Never try to directly manipulate encoded strings at byte-level, you'll get lost. Use the String and Encoding class methods of C# to get the bytes if you want bytes.

Alternate approach:

I'm wondering (like others) why your date-time data contains non-numeric characters. I saw in a comment that you get your data from reader["TIMESTAMP2"].ToString(); and the sample content is §║ ê or l¦h. Check if you are interpreting numeric data stored in reader["TIMESTAMP2"] as String by mistake and should you actually treat it as a numeric type. Otherwise, even with this method, you'll be getting unexpected output soon.

like image 44
Kapil Dhaimade Avatar answered Nov 24 '22 11:11

Kapil Dhaimade