I have a form with a textarea. Users enter a block of text which is stored in a database.
Occasionally a user will paste text from Word containing smart quotes or emdashes. Those characters appear in the database as: –, ’, “ ,â€
What function should I call on the input string to convert smart quotes to regular quotes and emdashes to regular dashes?
I am working in PHP.
Update: Thanks for all of the great responses so far. The page on Joel's site about encodings is very informative: http://www.joelonsoftware.com/articles/Unicode.html
Some notes on my environment:
The MySQL database is using UTF-8 encoding. Likewise, the HTML pages that display the content are using UTF-8 (Update:) by explicitly setting the meta content-type.
On those pages the smart quotes and emdashes appear as a diamond with question mark.
Solution:
Thanks again for the responses. The solution was twofold:
htmlspecialchars()
instead of htmlentities()
.Click the AutoFormat As You Type tab, and under Replace as you type, select or clear the "Straight quotes" with “smart quotes” check box. Click the AutoFormat tab, and under Replace, select or clear the "Straight quotes" with “smart quotes” check box.
Smart quotes are those fancy quotes that point different directions—you know, you see them all the time in typeset material. Word uses smart quotes automatically as one of the features in AutoFormat.
This sounds like a Unicode issue. Joel Spolsky has a good jumping off point on the topic: http://www.joelonsoftware.com/articles/Unicode.html
The mysql database is using UTF-8 encoding. Likewise, the html pages that display the content are using UTF-8.
The content of the HTML can be in UTF-8, yes, but are you explicitly setting the content type (encoding) of your HTML pages (generated via PHP?) to UTF-8 as well? Try returning a Content-Type
header of "text/html;charset=utf-8"
or add <meta>
tags to your HTMLs:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
That way, the content type of the data submitted to PHP will also be the same.
I had a similar issue and adding the <meta>
tag worked for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With