Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL choking on curly (smart) quotes

I'm inserting some data into a database from a form. I'm using addslashes to escape the text (have also tried mysql_real_escape_string with the same result).

Regular quotes are escaped, but some other quotes are not. For example, the string:

Homer's blood becomes the secret ingredient in Moe’s new beer.

is converted to:

Homer\'s blood becomes the secret ingredient in Moe’s new beer.

I didn't think the curly quote would matter unescaped, but only this text is inserted into the database:

Homer's blood becomes the secret ingredient in Moe

So PHP thinks the curly quote is fine, but MySQL is losing the string. MySQL is not giving any errors though.

like image 705
DisgruntledGoat Avatar asked Aug 23 '09 15:08

DisgruntledGoat


2 Answers

I would look for a mismatch between the character encoding used in your Web interface and that used at the database level. If your Web interface uses UTF-8, for example, and your database is using the default MySQL encoding of latin1, then you need to set up your tables with DEFAULT CHARSET=utf8.

Use mysql_real_escape_string() or mysqli, by the way. addslashes() is NOT adequate protection against SQL injection.

like image 87
chaos Avatar answered Sep 22 '22 17:09

chaos


The ’ in Moe’s is the only character in your example string that wouldn't be valid if that string is latin1 encoded but your mysql server expects utf8.

Simple demonstration:

<?php
function foo($s) {
    echo 'len=', strlen($s), ' ';
  for($i=0; $i<strlen($s); $i++) {
    printf('%02X ', ord($s[$i]));
  }
  echo "\n";
}

 // my file is latin1 encoded and so is the string literal
foo('Moe’s');
// now try it with an utf8 encoded string
foo( utf8_encode('Moe’s') );

prints

len=5 4D 6F 65 92 73
len=6 4D 6F 65 C2 92 73

Therefore the question is: Do you feed the mysql server something in a "wrong" encoding?
Each connection has a connection charset and the mysql server expects your client (php script) to send data that is encoded in that character set. You can find out what the connection charset is with

SHOW VARIABLES LIKE '%character%'

like in

$mysql = mysql_connect('..', '..', '..') or die(mysql_error());
mysql_select_db('..', $mysql) or die(mysql_error());

$query = "SHOW VARIABLES like '%character%'";
$result = mysql_query($query, $mysql) or die(__LINE__.mysql_error());
while( false!==($row=mysql_fetch_array($result, MYSQL_ASSOC)) ) {
  echo join(', ', $row), "\n";
}

This should print something like

character_set_client, utf8
character_set_connection, utf8
character_set_database, latin1
character_set_filesystem, binary
character_set_results, utf8
character_set_server, utf8
character_set_system, utf8

and character_set_connection, utf8 indicates that "my" connection character set is utf8, i.e. the mysql server expects utf8 encoded characters from the client (php). What's "your" connection charset?

Then take a look at the actual encoding of your parameter string, i.e. if you had

$foo = mysql_real_escape_string($_POST['foo'], $mysql);

replace that by

echo '<div>Debug hex($_POST[foo])=';
for($i=0; $i<strlen($s); $i++) {
    printf('%02X ', ord($_POST['foo'][$i]));
}
echo "</div>\n";
$foo = mysql_real_escape_string($_POST['foo'], $mysql);

and check what the actual encoding of your input string is. Does it print 92 or C2 92?

like image 38
VolkerK Avatar answered Sep 20 '22 17:09

VolkerK