Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bad encoding: why does my medium sized dash is differently encoded on another server?

My "em dash" character is shown differently on two servers.

When I visit Server 1:

When I visit Server 2: â€"Â

I'm not using any database connection, just pure HTML.

Following are the first 4 lines of my HTML file:

<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta charset="utf-8" />

Please help me here, I can't see what's wrong with it.

-solution-

Like suggested below I replaced my dash with

&#8211;

To make the server display my ►-character correctly I had to place a .htaccess in the folder with the following line of code:

AddDefaultCharset UTF-8

Thanks everyone!

like image 322
G McLuhan Avatar asked Mar 19 '12 16:03

G McLuhan


2 Answers

This may well happen, if the servers send different Content-Type headers. Exactly the same document may have different meanings when served with different encoding information.

It is also possible that something gets changed when uploading a file (incorrect conversions). But in this case, and usually, the header issue probably explains the difference.

If the document is UTF-8 encoded and contains “–” (which is EN DASH, U+2013, not EM DASH), then it gets displayed OK if the headers specify Content-Type: text/html;charset=utf-8. But if the header has e.g. windows-1252 instead of utf-8, then the three bytes that constitute the UTF-8 encoded representation of “–”, namely 0xE2 0x80 0x93, will be interpreted as per windows-1252 encoding, which means —. What happens then is somewhat obscure, if you really see â€"Â, but it’s more important to fix the encoding issue, which probably solves the problem.

Check out the W3C tutorial on encodings.

like image 98
Jukka K. Korpela Avatar answered Oct 04 '22 21:10

Jukka K. Korpela


It's possible they're being served with different encodings. In UTF-8, you can just include the m-dash directly (—), but if the page is being served as ASCII, it needs to be encoded as &mdash;. Take a look at the source and see which one it uses.

I think this is what's happening, because "—" is multiple bytes long, so it would be interpreted as multiple ASCII characters.

like image 45
Brendan Long Avatar answered Oct 04 '22 21:10

Brendan Long