Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper character encoding to display "”"?

I'm having some nasty character encoding problems that I just can't figure out.

Essentially, I'm screen scraping some HTML off of a site using PHP, then running it through PHP's DOMDocument to change out some URL's, etc., and when it's done, it outputs HTML with some weird things. Ex: where there should be an end quote, it puts out ”

I have the page's meta tag for charset set to utf-8 but then the ” characters are showing up as †on the site. I'm not sure if I just don't understand character encoding, or what.

Any suggestions on the best way to resolve this? Something client side with a meta tag, or some kind of server-side PHP conversion?

like image 364
Charles Zink Avatar asked Jun 21 '11 03:06

Charles Zink


People also ask

What character encoding should I use?

As a content author or developer, you should nowadays always choose the UTF-8 character encoding for your content or data. This Unicode encoding is a good choice because you can use a single character encoding to handle any character you are likely to need. This greatly simplifies things.

Should I use UTF-8 or ISO 8859?

Most libraries that don't hold a lot of foreign language materials will be perfectly fine with ISO8859-1 ( also called Latin-1 or extended ASCII) encoding format, but if you do have a lot of foreign language materials you should choose UTF-8 since that provides access to a lot more foreign characters.

Should I use UTF-8 or UTF-16?

UTF-16 is, obviously, more efficient for A) characters for which UTF-16 requires fewer bytes to encode than does UTF-8. UTF-8 is, obviously, more efficient for B) characters for which UTF-8 requires fewer bytes to encode than does UTF-16.

Is UTF-8 character set or encoding?

UTF-8 is a character encoding system. It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters. As of the mid 2020s, UTF-8 is one of the most popular encoding systems.


1 Answers

Sometimes setting the charset in HTML or the response header isn't enough. If everything isn't setup for UTF-8 on your server, your text may get incorrectly converted somewhere along the way. You may need to enable UTF-8 encoding for both Apache and PHP right in their config files. (If you're not using Apache, try skipping that step.)

Apache UTF-8 setup:

Edit either your charset.conf (ideal), or httpd.conf file, by adding this line to the end:

AddDefaultCharset utf-8

(If you don't have access to Apache's config files, you can create a ".htaccess" file in your HTML's root directory with that same code.)

PHP UTF-8 setup:

Edit your php.ini file, searching for "default_charset", and change it to:

default_charset = "utf-8"

Restart Apache:

Depending on your server type, this command may do the trick via command line:

sudo service apache2 restart
like image 166
gavanon Avatar answered Oct 10 '22 02:10

gavanon