Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# - Comparing strings of different encodings

Using C#, I fetch a TextBox.Text value from an .ascx page. When I compare the equality of the value to a regular string object inside a LINQ-query, it always returns false.

I have come to the conclusion that they are differently encoded, but have so far had no luck in converting or comparing them.

docname = "Testdoc 1.docx"; //regular string created in C#
fetchedVal = ((TextBox)e.Item.FindControl("txtSelectedDocs")).Text; //UTF-8

The above two strings are identical when represented as literals, but comparing the byte[] they are obviously different due to the encoding.

I've tried alot of different things, such as:

System.Text.Encoding.Default.GetString(utf8.GetBytes(fetchedVal));

but that will return the value "Testdoc 1.docx".

If I instead try

System.Text.Encoding.Default.GetString(System.Text.Encoding.Default.GetBytes(fetchedVal));

it returns "Testdoc 1.docx" but an Equals()-check still returns false.

I have also tried the following, which seem to be the recommended approach, but with no luck:

byte[] utf8Bytes = Encoding.UTF8.GetBytes(fetchedVal);
byte[] unicodeBytes = Encoding.Convert(Encoding.UTF8, Encoding.Unicode, utf8Bytes);
string fetchedValConverted = Encoding.Unicode.GetString(unicodeBytes);

The culprit appears to be the whitespace, because when examining the byte sequence it's always the seventh byte that differs.

How do you properly convert from UTF-8 to default string encoding in C#?

like image 241
Daniel B Avatar asked Sep 29 '14 15:09

Daniel B


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr. Stroustroupe.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

Why do we write C?

We write C for Carbon Because in some element the symbol of the element is taken form its first words and Co for Cobalt beacause in some elements the symbol of the element is taken from its first second letters, so that the we don't get confuse.


1 Answers

Strings don't have encodings or byte arrays. Encodings only come into play when you convert a string into a byte array; you can only do that by specifying which encoding to use to pick bytes.

It sounds like you actually simply have different characters in your strings. You might have an invisible character in one of them, or they might have different characters that look the same.

To find out, look at the Unicode codepoint values of each character in each string (eg, (int) str[0]).

like image 111
SLaks Avatar answered Oct 12 '22 21:10

SLaks