How do I convert a string
to a byte[]
in .NET (C#) without manually specifying a specific encoding?
I'm going to encrypt the string. I can encrypt it without converting, but I'd still like to know why encoding comes to play here.
Also, why should encoding even be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?
The absolute safest way to convert bytes to a string and back is to use base64: string base64 = Convert. ToBase64String(bytes); byte[] bytes = Convert. FromBase64String(base64);
Step 1: Get the character. Step 2: Convert the character into string using ToString() method. Step 3: Convert the string into byte using the GetBytes()[0] Method and store the converted string to the byte. Step 4: Return or perform the operation on the byte.
A byte string is a fixed-length array of bytes. A byte is an exact integer between 0 and 255 inclusive. A byte string can be mutable or immutable. When an immutable byte string is provided to a procedure like bytes-set!, the exn:fail:contract exception is raised.
Contrary to the answers here, you DON'T need to worry about encoding if the bytes don't need to be interpreted!
Like you mentioned, your goal is, simply, to "get what bytes the string has been stored in".
(And, of course, to be able to re-construct the string from the bytes.)
For those goals, I honestly do not understand why people keep telling you that you need the encodings. You certainly do NOT need to worry about encodings for this.
Just do this instead:
static byte[] GetBytes(string str) { byte[] bytes = new byte[str.Length * sizeof(char)]; System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length); return bytes; } // Do NOT use on arbitrary bytes; only use on GetBytes's output on the SAME system static string GetString(byte[] bytes) { char[] chars = new char[bytes.Length / sizeof(char)]; System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length); return new string(chars); }
As long as your program (or other programs) don't try to interpret the bytes somehow, which you obviously didn't mention you intend to do, then there is nothing wrong with this approach! Worrying about encodings just makes your life more complicated for no real reason.
Additional benefit to this approach: It doesn't matter if the string contains invalid characters, because you can still get the data and reconstruct the original string anyway!
It will be encoded and decoded just the same, because you are just looking at the bytes.
If you used a specific encoding, though, it would've given you trouble with encoding/decoding invalid characters.
It depends on the encoding of your string (ASCII, UTF-8, ...).
For example:
byte[] b1 = System.Text.Encoding.UTF8.GetBytes (myString); byte[] b2 = System.Text.Encoding.ASCII.GetBytes (myString);
A small sample why encoding matters:
string pi = "\u03a0"; byte[] ascii = System.Text.Encoding.ASCII.GetBytes (pi); byte[] utf8 = System.Text.Encoding.UTF8.GetBytes (pi); Console.WriteLine (ascii.Length); //Will print 1 Console.WriteLine (utf8.Length); //Will print 2 Console.WriteLine (System.Text.Encoding.ASCII.GetString (ascii)); //Will print '?'
ASCII simply isn't equipped to deal with special characters.
Internally, the .NET framework uses UTF-16 to represent strings, so if you simply want to get the exact bytes that .NET uses, use System.Text.Encoding.Unicode.GetBytes (...)
.
See Character Encoding in the .NET Framework (MSDN) for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With