Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In a HTML file with charset=utf-8, do I still need to replace umlauts with ä?

Tags:

Whenever I used a German umlaut in a HTML file in the past, always I replaced it by ä, ö etc. according to this table.

I had no idea about encoding then, nor did I ever think about it.
I just "knew" that when I would just use ä, ö etc. instead, many computers in other countries wouldn't be able to display the umlauts correctly.

When I set the charset of an UTF-8 encoded HTML file to UTF-8 by putting <meta charset="utf-8"> into the header, do I still need to replace ä by &auml;, ö by &ouml; and so on?

For example:

<!DOCTYPE html> <html>   <head>     <meta charset="utf-8">     <title>ÄÖÜäöüß</title>   </head>   <body>     ÄÖÜäöüß   </body> </html> 

When I save this in an UTF-8 encoded HTML file on my machine and view it in a browser, all umlauts are displayed correctly.
But I'm in Germany and everything on my machine is in German, so of course my machine is able to properly display German umlauts.

I read Joel's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), but this encoding stuff is all new to me.

From what I understand about encoding, UTF-8 and setting charsets, I'm suspecting that setting <meta charset="utf-8"> in an UTF-8 encoded HTML file means I do not need to use &auml; etc. anymore.
But I couldn't find a source that definitely says so.

like image 464
Christian Specht Avatar asked Mar 16 '14 21:03

Christian Specht


People also ask

Can UTF-8 handle special characters?

UTF-8 represents ASCII invariant characters a-z, A-Z, 0-9, and certain special characters such as ' @ , . + - = / * ( ) the same way that they are represented in ASCII.

What does charset UTF-8 do in HTML?

The charset attribute specifies the character encoding for the HTML document. The HTML5 specification encourages web developers to use the UTF-8 character set, which covers almost all of the characters and symbols in the world!

Can UTF-8 support all characters?

UTF-8 supports any unicode character, which pragmatically means any natural language (Coptic, Sinhala, Phonecian, Cherokee etc), as well as many non-spoken languages (Music notation, mathematical symbols, APL). The stated objective of the Unicode consortium is to encompass all communications.


1 Answers

When I set the charset of a HTML file to UTF-8 by putting <meta charset="utf-8"> into the header

That doesn't set the character encoding. It declares which character encoding you are using. You must also make sure you are saving the HTML using that encoding.

Given that then:

do I still need to replace ä by &auml;, ö by &ouml; and so on?

No. Only characters with special meaning in HTML (<, >, &, ", ' … all of which only have special meaning in some contexts) must be replaced with character references.

like image 75
Quentin Avatar answered Sep 27 '22 19:09

Quentin