Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 html without BOM displays strange characters

I have some HTML which contains some forign characters (€, ó, á). The HTML document is saved as UTF-8 without BOM. When I view the page in the browser the forign characters seem to get replaced with stranger character combinations (€, ó, Ã). It's only when I save my HTML document as UTF-8 with BOM that the characters then display properly.

I'd really rather not have to include a BOM in my files, but has anybody got any idea why it might do this? and a way to fix it? (other than including a BOM)

like image 468
Matt Brailsford Avatar asked Mar 01 '12 15:03

Matt Brailsford


People also ask

What is the difference between UTF-8 and UTF-8 without BOM?

There is no official difference between UTF-8 and BOM-ed UTF-8. A BOM-ed UTF-8 string will start with the three following bytes. EF BB BF. Those bytes, if present, must be ignored when extracting the string from the file/stream.

Does UTF-8 need BOM?

In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 or UTF-32 encodings, there is no alternative sequence of bytes in a character.

What is UTF-8 without BOM?

The UTF-8 encoding without a BOM has the property that a document which contains only characters from the US-ASCII range is encoded byte-for-byte the same way as the same document encoded using the US-ASCII encoding. Such a document can be processed and understood when encoded either as UTF-8 or as US-ASCII.


2 Answers

You are probably not specifying the correct character set in your HTML file. The BOM (thanks @Jukka) sends the browser into UTF-.8 mode; in its absence, you need to use other means to declare the document UTF.8.

If you have access to your server configuration, you may want to make sure the server isn't sending the wrong character set info. See e.g. How to change the default encoding to UTF-8 for Apache?

If you have access only to your HTML, adding this meta tag in your document's head should do the trick:

<meta http-equiv='Content-Type' content='Type=text/html; charset=utf-8'>

or as @Mathias points out, the new HTML 5

<meta charset="utf-8"> 

(valid only if you use a HTML 5 doctype, against which there is no good argument any more even if you don't use HTML 5 markup.)

like image 93
Pekka Avatar answered Sep 22 '22 19:09

Pekka


Insert <meta charset="utf-8"> in <head>.
Or set the header Content-Type: text/html;charset=utf-8 on the server-side.

You can also do add in .htaccess: AddDefaultCharset UTF-8 more info here http://www.askapache.com/htaccess/setting-charset-in-htaccess.html

like image 42
Nick Shvelidze Avatar answered Sep 21 '22 19:09

Nick Shvelidze