Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTML5 Encoding & Cyrillic

Something that made me curious - supposedly the default character encoding in HTML5 is UTF-8. However if I have a plain simple HTML file with an HTML5 doctype like the code below, I get:

"hello" in Russian: "ЗдраÑтвуйте"

In Chrome 33+, Safari 6, IE11, etc.

<!DOCTYPE html>

<html>

<head></head>

<body>
    <p>"hello" in Russian is "здраствуйте"</p>
</body>

</html>

What gives? Shouldn't the browser utilize the UTF-8 unicode standard and display the text correctly? I'm using Coda which is set to save html files with UTF-8 encoding by default so that's not the problem.

like image 407
dkugappi Avatar asked Mar 29 '14 19:03

dkugappi


People also ask

What is encoding in HTML5?

The default character encoding for HTML5 is UTF-8.

Does HTML use UTF-8?

The HTML5 Standard: Unicode UTF-8 The Unicode Standard covers (almost) all the characters, punctuations, and symbols in the world. Unicode enables processing, storage, and transport of text independent of platform and language. The default character encoding in HTML-5 is UTF-8.

Why is UTF-8 used for HTML files?

Using UTF-8 not only simplifies authoring of pages, it avoids unexpected results on form submission and URL encodings, which use the document's character encoding by default.

Which is the default character encoding HTML5 ISO 8859 1 UTF-8 UTF 32 UTF 16?

The default character encoding in HTML5 is UTF - 8.


2 Answers

The text data in the example is UTF-8 encoded text misinterpreted as window-1252 encoded. The reason is that the encoding has not been specified and browsers are forced to make a guess. To fix this, specify the encoding; see the W3C page Character encodings. Two simple ways that work independently of server settings, as long as the server does not send wrong encoding information in HTTP headers:

1) Save the file as UTF-8 with BOM (there is probably an option for this in your authoring program.

2) Add the following tag into the head part:

<meta charset=utf-8>

There is no single default encoding specified for HTML5. On the contrary, browsers are expected to make guesses when no encoding has been declared. This is a fairly complex process, described in 8.2.2.2 Determining the character encoding.

like image 139
Jukka K. Korpela Avatar answered Oct 19 '22 08:10

Jukka K. Korpela


If you want to be sure which charset will be used by browser you must have in your page head

 <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

otherwise you are at the mercy of local settings and browser automation.

like image 9
All Blond Avatar answered Oct 19 '22 07:10

All Blond