Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting string to byte[] creates zero character

In this convert function

public static byte[] GetBytes(string str)
{
    byte[] bytes = new byte[str.Length * sizeof(char)];
    System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
    return bytes;
}

byte[] test = GetBytes("abc");

The resulting array contains zero character

test = [97, 0, 98, 0, 99, 0]

And when we convert byte[] back to string, the result is

string test = "a b c "

How do we make it so it doesn't create those zeroes

like image 658
strike_noir Avatar asked Jan 06 '13 12:01

strike_noir


People also ask

Can we convert string to byte?

A String is stored as an array of Unicode characters in Java. To convert it to a byte array, we translate the sequence of characters into a sequence of bytes. For this translation, we use an instance of Charset. This class specifies a mapping between a sequence of chars and a sequence of bytes.

What is a byte [] in C#?

In C#, byte is the data type for 8-bit unsigned integers, so a byte[] should be an array of integers who are between 0 and 255, just like an char[] is an array of characters.

Why do we convert string to bytes?

Java's string type is Unicode: a string is a sequence of characters (actually, "code points") rather than of bytes. In order to send that correctly over the network, you need to have some convention for how those code points (of which there are about a million) are to be represented as bytes.


2 Answers

First let's look at what your code does wrong. char is 16-bit (2 byte) in .NET framework. Which means when you write sizeof(char), it returns 2. str.Length is 1, so actually your code will be byte[] bytes = new byte[2] is the same byte[2]. So when you use Buffer.BlockCopy() method, you actually copy 2 bytes from a source array to a destination array. Which means your GetBytes() method returns bytes[0] = 32 and bytes[1] = 0 if your string is " ".

Try to use Encoding.ASCII.GetBytes() instead.

When overridden in a derived class, encodes all the characters in the specified string into a sequence of bytes.

const string input = "Soner Gonul";

byte[] array = Encoding.ASCII.GetBytes(input);

foreach ( byte element in array )
{
     Console.WriteLine("{0} = {1}", element, (char)element);
}

Output:

83 = S
111 = o
110 = n
101 = e
114 = r
32 =
71 = G
111 = o
110 = n
117 = u
108 = l
like image 63
Soner Gönül Avatar answered Sep 24 '22 20:09

Soner Gönül


Just to clear the confusion about your answer, char type in C# takes 2 bytes. So, string.toCharArray() returns an array in which each item takes 2 bytes of storage. While copying to byte array where each item takes 1 byte storage, there occurs a data loss. Hence the zeroes showing up in result.
As suggested, Encoding.ASCII.GetBytes is a safer option to use.

like image 23
prthrokz Avatar answered Sep 23 '22 20:09

prthrokz