Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this appearing in my c# strings: £

I have a a string in c# initialised as follows:

string strVal = "£2000";

However whenever I write this string out the following is written:

£2000

It does not do this with dollars.

An example bit of code I am using to write out the value:

System.IO.File.AppendAllText(HttpContext.Current.Server.MapPath("/logging.txt"), strVal);

I'm guessing it's something to do with localization but if c# strings are just unicode surely this should just work?

CLARIFICATION: Just a bit more info, Jon Skeet's answer is correct, however I also get the issue when I URLEncode the string. Is there a way of preventing this?

So the URL encoded string looks like this:

"%c2%a32000"

%c2 = Â %a3 = £

If I encode as ASCII the £ comes out as ?

Any more ideas?

like image 591
Tim Avatar asked Mar 30 '09 10:03

Tim


2 Answers

AppendAllText is writing out the text in UTF-8.

What are you using to look at it? Chances are it's something that doesn't understand UTF-8, or doesn't try UTF-8 first. Tell your editor/viewer that it's a UTF-8 file and all should be well. Alternatively, use the overload of AppendAllText which allows you to specify the encoding and use whichever encoding is going to be most convenient for you.

EDIT: In response to your edited question, the reason it fails when you encode with ASCII is that £ is not in the ASCII character set (which is Unicode 0-127).

URL encoding is also using UTF-8, by the looks of it. Again, if you want to use a different encoding, specify it to the HttpUtility.UrlEncode overload which accepts an encoding.

like image 91
Jon Skeet Avatar answered Sep 27 '22 23:09

Jon Skeet


The default character set of URLs when used in HTML pages and in HTTP headers is called ISO-8859-1 or ISO Latin-1.

It's not the same as UTF-8, and it's not the same as ASCII, but it does fit into one-byte-per-character. The range 0 to 127 is a lot like ASCII, and the whole range 0 to 255 is the same as the range 0000-00FF of Unicode.

So you can generate it from a C# string by casting each character to a byte, or you can use Encoding.GetEncoding("iso-8859-1") to get an object to do the conversion for you.

(In this character set, the UK pound symbol is 163.)

Background

The RFC says that unencoded text must be limited to the traditional 7-bit US ASCII range, and anything else (plus the special URL delimiter characters) must be encoded. But it leaves open the question of what character set to use for the upper half of the 8-bit range, making it dependent on the context in which the URL appears.

And that context is defined by two other standards, HTTP and HTML, which do specify the default character set, and which together create a practically irresistable force on implementers to assume that the address bar contains percent-encodings that refer to ISO-8859-1.

ISO-8859-1 is the character set of text-based content sent via HTTP except where otherwise specified. So by the time a URL string appears in the HTTP GET header, it ought to be in ISO-8859-1.

The other factor is that HTML also uses ISO-8859-1 as its default, and URLs typically originate as links in HTML pages. So when you craft a simple minimal HTML page in Notepad, the URLs you type into that file are in ISO-8859-1.

It's sometimes described as "hole" in the standards, but it's not really; it's just that HTML/HTTP fill in the blank left by the RFC for URLs.

Hence, for example, the advice on this page:

URL encoding of a character consists of a "%" symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.

(ISO-Latin is another name for IS-8859-1).

So much for the theory. Paste this into notepad, save it as an .html file, and open it in a few browsers. Click the link and Google should search for UK pound.

<HTML>
  <BODY>
    <A href="http://www.google.com/search?q=%a3">Test</A>
  </BODY>
</HTML>

It works in IE, Firefox, Apple Safari, Google Chrome - I don't have any others available right now.

like image 42
Daniel Earwicker Avatar answered Sep 27 '22 23:09

Daniel Earwicker