Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get a consistent byte representation of strings in C# without manually specifying an encoding?

How do I convert a string to a byte[] in .NET (C#) without manually specifying a specific encoding?

I'm going to encrypt the string. I can encrypt it without converting, but I'd still like to know why encoding comes to play here.

Also, why should encoding even be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?

like image 536
Agnel Kurian Avatar asked Jan 23 '09 13:01

Agnel Kurian


People also ask

How do you store bytes as strings?

The absolute safest way to convert bytes to a string and back is to use base64: string base64 = Convert. ToBase64String(bytes); byte[] bytes = Convert. FromBase64String(base64);

How do you determine character bytes?

Step 1: Get the character. Step 2: Convert the character into string using ToString() method. Step 3: Convert the string into byte using the GetBytes()[0] Method and store the converted string to the byte. Step 4: Return or perform the operation on the byte.

What is a byte string?

A byte string is a fixed-length array of bytes. A byte is an exact integer between 0 and 255 inclusive. A byte string can be mutable or immutable. When an immutable byte string is provided to a procedure like bytes-set!, the exn:fail:contract exception is raised.


2 Answers

Contrary to the answers here, you DON'T need to worry about encoding if the bytes don't need to be interpreted!

Like you mentioned, your goal is, simply, to "get what bytes the string has been stored in".
(And, of course, to be able to re-construct the string from the bytes.)

For those goals, I honestly do not understand why people keep telling you that you need the encodings. You certainly do NOT need to worry about encodings for this.

Just do this instead:

static byte[] GetBytes(string str) {     byte[] bytes = new byte[str.Length * sizeof(char)];     System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);     return bytes; }  // Do NOT use on arbitrary bytes; only use on GetBytes's output on the SAME system static string GetString(byte[] bytes) {     char[] chars = new char[bytes.Length / sizeof(char)];     System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);     return new string(chars); } 

As long as your program (or other programs) don't try to interpret the bytes somehow, which you obviously didn't mention you intend to do, then there is nothing wrong with this approach! Worrying about encodings just makes your life more complicated for no real reason.

Additional benefit to this approach: It doesn't matter if the string contains invalid characters, because you can still get the data and reconstruct the original string anyway!

It will be encoded and decoded just the same, because you are just looking at the bytes.

If you used a specific encoding, though, it would've given you trouble with encoding/decoding invalid characters.

like image 197
user541686 Avatar answered Oct 27 '22 22:10

user541686


It depends on the encoding of your string (ASCII, UTF-8, ...).

For example:

byte[] b1 = System.Text.Encoding.UTF8.GetBytes (myString); byte[] b2 = System.Text.Encoding.ASCII.GetBytes (myString); 

A small sample why encoding matters:

string pi = "\u03a0"; byte[] ascii = System.Text.Encoding.ASCII.GetBytes (pi); byte[] utf8 = System.Text.Encoding.UTF8.GetBytes (pi);  Console.WriteLine (ascii.Length); //Will print 1 Console.WriteLine (utf8.Length); //Will print 2 Console.WriteLine (System.Text.Encoding.ASCII.GetString (ascii)); //Will print '?' 

ASCII simply isn't equipped to deal with special characters.

Internally, the .NET framework uses UTF-16 to represent strings, so if you simply want to get the exact bytes that .NET uses, use System.Text.Encoding.Unicode.GetBytes (...).

See Character Encoding in the .NET Framework (MSDN) for more information.

like image 29
bmotmans Avatar answered Oct 27 '22 21:10

bmotmans