Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the default character encoding for HTML?

For some reason, the plain text character on the html side is being dsiplayed as –. The only thing I can think that would be attributed to this is the character encoding. My guess is that it's utf-8, but not sure how I am getting the weird characters. Is there an explanation?

What I mean by default is if the charset isn't specified.

like image 375
Chad Harrison Avatar asked Aug 28 '12 16:08

Chad Harrison


2 Answers

That certainly looks like UTF-8 being interpreted as something else.

HTML doesn't have a default. It's picked up from the headers of the transfer protocol (normally HTTP) or failing that, from a BOM, from meta elements or, in the case of XHTML, the XML declaration. In the absence of any of those, the user-agent guesses.

HTTP has a default of ISO-8859-1, which even one HTML spec described as having "proved useless" [source] (they don't even go into the fact that a large amount of stuff out there labelled as ISO-8859-1 is actually CP-1252).

Hence. Forget about defaults, always set your HTTP headers and your meta elements (in case it's saved as a file).

And always do so as UTF-8. Anything else in this day and age is just an act of masochism.

like image 117
Jon Hanna Avatar answered Nov 04 '22 15:11

Jon Hanna


The !DOCTYPE doesn't set a character encoding, the meta element together with the (newly standardized) charset attribute does. If it's absent I'm not entirely sure how the browser determines the encoding.

I believe the problem you're having though is that your page is saved in one encoding and served in another.

Just make sure you set <meta charset="utf8"/> and make sure your document is in fact utf8 and it should work.

like image 31
powerbuoy Avatar answered Nov 04 '22 16:11

powerbuoy