Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Question mark characters display within text. Why is this?

I have a backup server that automatically backs up my live site, both files and database.

On the live site, the text looks fine, but when you view the mirrored version of it, it displays '?' within some of the text. This text is stored within the news database table.

Here is a screenshot of it being on the live server and of it on the mirrored server.

What could happen within the process of backing it up to the mirrored server?

Alt text

The live server is Solaris, and the mirrored server is Linux Red Hat Linux 5.

like image 613
Brad Avatar asked Oct 27 '08 18:10

Brad


People also ask

Why do texts show question marks?

It means your Unicode text is getting converted to ANSI text somewhere. Since Unicode characters outside of Latin-1 can't be converted to ANSI, they are converted to question marks.

What does a question mark in a black diamond mean in a text message?

It's called "Replacement Character". See the related Wikipedia page. U+FFFD �​: "replacement character" used to replace an unknown or unprintable character.


2 Answers

The following articles will be useful:

10.3 Specifying Character Sets and Collations

10.4 Connection Character Sets and Collations

After you connect to the database, issue the following command:

SET NAMES 'utf8'; 

Ensure that your web page also uses the UTF-8 encoding:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 

PHP also offers several functions that will be useful for conversions:

  • iconv
  • mb_convert_encoding
like image 73
IAdapter Avatar answered Oct 04 '22 11:10

IAdapter


Edit your Apache configuration file on the "mirror" server (the server with the problem), and comment-out the following line:

AddDefaultCharset UTF-8 

Then restart Apache:

service httpd restart 

The problem is that the "AddDefaultCharset UTF-8" line overrides the Content-Type specified in the .html files; e.g.:

<meta http-equiv=Content-Type content="text/html; charset=windows-1252"> 

The most common symptom is that character codes above 127 display as black diamonds with question marks on them (in Chrome, Safari or Firefox), or as little boxes (in Internet Explorer and Opera).

HTML files generated by Microsoft Word usually have many such characters, the most common one being character code 160 = 0xA0, which is equivalent to "&nbsp;" in the Windows-1252 encoding, and is often found between span tags, like this:

<span style="mso-spacerun: yes">ááá </span> 
like image 40
Dave Burton Avatar answered Oct 04 '22 11:10

Dave Burton