An example HTML document retrieved over HTTP lacks:
Content-Type
header<meta charset="<character encoding>" />
<meta http-equiv='Content-Type' content='Type=text/html; charset=<character encoding>'>
With regards to HTML5, is a default, for example UTF-8, assumed as the character encoding? Or is it entirely up the application reading the HTML document to choose a default?
The default character encoding for HTML5 is UTF-8.
Quick answer. Always declare the encoding of your document using a meta element with a charset attribute, or using the http-equiv and content attributes (called a pragma directive).
encoding attribute, Java uses “UTF-8” character encoding by default. Character encoding basically interprets a sequence of bytes into a string of specific characters. The same combination of bytes can denote different characters in different character encoding.
Meta tag is used to indicate the character set. meta tag is used to indicate char set.
The charset is determined using these rules:
- User override.
- An HTTP "charset" parameter in a "Content-Type" field.
- A Byte Order Mark before any other data in the HTML document itself.
- A META declaration with a "charset" attribute.
- A META declaration with an "http-equiv" attribute set to "Content-Type" and a value set for "charset".
- Unspecified heuristic analysis.
...and then...
- Normalize the given character encoding string according to the Charset Alias Matching rules defined in Unicode Technical Standard #22.
- Override some problematic encodings, i.e. intentionally treat some encodings as if they were different encodings. The most common override is treating US-ASCII and ISO-8859-1 as Windows-1252, but there are several other encoding overrides listed in this table. As the specification notes, "The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification."
But the most important thing is:
You should always specify a character encoding on every HTML document, or bad things will happen. You can do it the hard way (HTTP Content-Type header), the easy way (
<meta http-equiv>
declaration), or the new way (<meta charset>
attribute), but please do it. The web thanks you.
Sources:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With