How do I get a consistent byte representation of strings in C# without manually specifying an encoding?

2 Answers

Contrary to the answers here, you DON'T need to worry about encoding if the bytes don't need to be interpreted!

Like you mentioned, your goal is, simply, to "get what bytes the string has been stored in".
(And, of course, to be able to re-construct the string from the bytes.)

For those goals, I honestly do not understand why people keep telling you that you need the encodings. You certainly do NOT need to worry about encodings for this.

Just do this instead:

static byte[] GetBytes(string str) {     byte[] bytes = new byte[str.Length * sizeof(char)];     System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);     return bytes; }  // Do NOT use on arbitrary bytes; only use on GetBytes's output on the SAME system static string GetString(byte[] bytes) {     char[] chars = new char[bytes.Length / sizeof(char)];     System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);     return new string(chars); }

As long as your program (or other programs) don't try to interpret the bytes somehow, which you obviously didn't mention you intend to do, then there is nothing wrong with this approach! Worrying about encodings just makes your life more complicated for no real reason.

Additional benefit to this approach: It doesn't matter if the string contains invalid characters, because you can still get the data and reconstruct the original string anyway!

It will be encoded and decoded just the same, because you are just looking at the bytes.

If you used a specific encoding, though, it would've given you trouble with encoding/decoding invalid characters.

197

answered Oct 27 '22 22:10

user541686

It depends on the encoding of your string (ASCII, UTF-8, ...).

For example:

byte[] b1 = System.Text.Encoding.UTF8.GetBytes (myString); byte[] b2 = System.Text.Encoding.ASCII.GetBytes (myString);

A small sample why encoding matters:

string pi = "\u03a0"; byte[] ascii = System.Text.Encoding.ASCII.GetBytes (pi); byte[] utf8 = System.Text.Encoding.UTF8.GetBytes (pi);  Console.WriteLine (ascii.Length); //Will print 1 Console.WriteLine (utf8.Length); //Will print 2 Console.WriteLine (System.Text.Encoding.ASCII.GetString (ascii)); //Will print '?'

ASCII simply isn't equipped to deal with special characters.

Internally, the .NET framework uses UTF-16 to represent strings, so if you simply want to get the exact bytes that .NET uses, use System.Text.Encoding.Unicode.GetBytes (...).

See Character Encoding in the .NET Framework (MSDN) for more information.

answered Oct 27 '22 21:10

bmotmans

Related questions
                            
                                What is the difference between const and readonly in C#?
                            
                                Why not inherit from List<T>?
                            
                                Why is it important to override GetHashCode when Equals method is overridden?
                            
                                Try-catch speeding up my code?
                            
                                What does the [Flags] Enum Attribute mean in C#?
                            
                                Calculate relative time in C#
                            
                                How to loop through all enum values in C#? [duplicate]
                            
                                Calling the base constructor in C#
                            
                                Type Checking: typeof, GetType, or is?
                            
                                Is there a reason for C#'s reuse of the variable in a foreach?
                            
                                Proper use of the IDisposable interface
                            
                                How do I remedy "The breakpoint will not currently be hit. No symbols have been loaded for this document." warning?
                            
                                What do two question marks together mean in C#?
                            
                                What is a NullReferenceException, and how do I fix it?
                            
                                How do I create an Excel (.XLS and .XLSX) file in C# without installing Microsoft Office?
                            
                                How do I calculate someone's age based on a DateTime type birthday?
                            
                                Get int value from enum in C#
                            
                                What is the best way to give a C# auto-property an initial value?
                            
                                How do I generate a random int number?
                            
                                Should 'using' directives be inside or outside the namespace?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I get a consistent byte representation of strings in C# without manually specifying an encoding?

Tags:

string

c#

.net

character-encoding

Agnel Kurian

People also ask

2 Answers

user541686

bmotmans

Recent Activity

Donate For Us