Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

<meta charset="utf-8"> declares encoding of own file?

I read that an HTML file has to contain the <meta charset="utf-8">element in the head-element to be standard-conforming.

Why does it make sense to specifiy the encoding of a file in the file itself? In order to read the meta-element one has to know the encoding already; so it seems redundant/useless to specify the encoding again.

like image 337
Toby Brull Avatar asked Sep 14 '13 12:09

Toby Brull


People also ask

How do I change the character encoding of the file to UTF-8?

The character encoding should be specified for every HTML page, either by using the charset parameter on the Content-Type HTTP response header (e.g.: Content-Type: text/html; charset=utf-8 ) and/or using the charset meta tag in the file.

What is this meta charset UTF-8?

Definition and UsageThe charset attribute specifies the character encoding for the HTML document. The HTML5 specification encourages web developers to use the UTF-8 character set, which covers almost all of the characters and symbols in the world!

Is UTF-8 character set or encoding?

UTF-8 is a character encoding system. It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters. As of the mid 2020s, UTF-8 is one of the most popular encoding systems.

Is UTF-8 the default encoding?

UTF-8 is the dominant encoding for the World Wide Web (and internet technologies), accounting for 98% of all web pages, and up to 100.0% for some languages, as of 2022.


2 Answers

Until this element is read, the document is interpreted with the default encoding of the user agent. (This is often ISO-8859-1.) If the encoding is different from the default, then the document is re-interpreted according to the meta element. That's why you should place it as early as possible in the body, or preferably use an HTTP header (see below).

The hope with the <meta> element is that the preceding characters are all in the ASCII character set, which are interpreted correctly in just about all character sets.

In general, however, and if it is possible, this information should be sent in an HTTP response header:

Content-Type: text/html; charset=utf-8

This ensures that the document is interpreted correctly from the start.

like image 93
cmbuckley Avatar answered Oct 19 '22 05:10

cmbuckley


It's true that it's paradoxical for a document to declare its encoding within itself. And it really is only a secondary fallback. The HTTP Content-Type header always takes precedent if set; and it should always be set.

Declaring the charset in an HTML meta element makes sense in case the document is ever treated in a non-HTTP context; meaning if it's ever not served over HTTP and can hence not declare its encoding in the HTTP header. This may be the case if the document is downloaded and saved for later offline use. In this case it just so happens that most encodings are ASCII compatible, and the browser will typically try to read the document in an ASCII compatible default encoding like Latin-1 or UTF-8 (depending on the settings of the browser) until it encounters the meta tag. If your document is saved in a non-ASCII compatible encoding, say Shift-JIS or GB18030, this may or may not work depending on the default settings and how intelligently the browser can figure out what encoding it's dealing with; it's really mostly up to the browser how to deal with this situation.

like image 34
deceze Avatar answered Oct 19 '22 07:10

deceze