Answers provided have all been great, I mentioned in the comments of Alnitak's answer that I would need to go take a look at my CSV Generation script because for whatever reason it wasn't outputting UTF-8.
As was correctly pointed out, it WAS outputting UTF-8 - the problem existed with Ye Olde Microsoft Excel which wasn't picking up the encoding the way I would have liked.
My existing CSV generation looked something like:
// Create file and exit;
$filename = $file."_".date("Y-m-d_H-i",time());
header("Content-type: application/vnd.ms-excel");
header("Content-disposition: csv" . date("Y-m-d") . ".csv");
header( "Content-disposition: filename=".$filename.".csv");
echo $csv_output;
It now looks like:
// Create file and exit;
$filename = $file."_".date("Y-m-d_H-i",time());
header("Content-type: text/csv; charset=ISO-8859-1");
header("Content-disposition: csv" . date("Y-m-d") . ".csv");
header("Content-disposition: filename=".$filename.".csv");
echo iconv('UTF-8', 'ISO-8859-1', $csv_output);
ORIGINAL QUESTION
Hi,
I've got a form which collects data, form works ok but I've just noticed that if someone types or uses a '£' symbol, the MySQL DB ends up with '£'.
Not really sure where or how to stop this from happening, code and DB information to follow:
MySQL details
mysql> SHOW COLUMNS FROM fraud_report;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | mediumint(9) | | PRI | NULL | auto_increment |
| crm_number | varchar(32) | YES | | NULL | |
| datacash_ref | varchar(32) | YES | | NULL | |
| amount | varchar(32) | YES | | NULL | |
| sales_date | varchar(32) | YES | | NULL | |
| domain | varchar(32) | YES | | NULL | |
| date_added | datetime | YES | | NULL | |
| agent_added | varchar(32) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
8 rows in set (0.03 sec)
PHP Function
function processFraudForm($crm_number, $datacash_ref, $amount, $sales_date, $domain, $agent_added) {
// Insert Data to DB
$sql = "INSERT INTO fraud_report (id, crm_number, datacash_ref, amount, sales_date, domain, date_added, agent_added) VALUES (NULL, '$crm_number', '$datacash_ref', '$amount', '$sales_date', '$domain', NOW(), '$agent_added')";
$result = mysql_query($sql) or die (mysql_error());
if ($result) {
$outcome = "<div id=\"success\">Emails sent and database updated.</div>";
} else {
$outcome = "<div id=\"error\">Something went wrong!</div>";
}
return $outcome;
}
Example DB Entry
+----+------------+--------------+---------+------------+--------------------+---------------------+------------------+
| id | crm_number | datacash_ref | amount | sales_date | domain | date_added | agent_added |
+----+------------+--------------+---------+------------+--------------------+---------------------+------------------+
| 13 | 100xxxxxxx | 10000000 | £10.93 | 18/12/08 | blargh.com | 2008-12-22 10:53:53 | agent.name |
What you're seeing is UTF-8 encoding - it's a way of storing Unicode characters in a relatively compact format.
The pound symbol has value 0x00a3
in Unicode, but when it's written in UTF-8 that becomes 0xc2 0xa3
and that's what's stored in the database. It seems that your database table is already set to use UTF-8 encoding. This is a good thing!
If you pull the value back out from the database and display it on a UTF-8 compatible terminal (or on a web page that's declared as being UTF-8 encoded) it will look like a normal pound sign again.
£ is 0xC2 0xA3 which is the UTF-8 encoding for £ symbol - so you're storing it as UTF-8, but presumably viewing it as Latin-1 or something other than UTF-8
It's useful to know how to spot and decode UTF-8 by hand - check the wikipedia page for info on how the encoding works:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With