Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL or PHP is appending a  whenever the £ is used

Answers provided have all been great, I mentioned in the comments of Alnitak's answer that I would need to go take a look at my CSV Generation script because for whatever reason it wasn't outputting UTF-8.

As was correctly pointed out, it WAS outputting UTF-8 - the problem existed with Ye Olde Microsoft Excel which wasn't picking up the encoding the way I would have liked.

My existing CSV generation looked something like:

// Create file and exit;
$filename = $file."_".date("Y-m-d_H-i",time());
header("Content-type: application/vnd.ms-excel");
header("Content-disposition: csv" . date("Y-m-d") . ".csv");
header( "Content-disposition: filename=".$filename.".csv");
echo $csv_output;

It now looks like:

// Create file and exit;
$filename = $file."_".date("Y-m-d_H-i",time());
header("Content-type: text/csv; charset=ISO-8859-1");
header("Content-disposition: csv" . date("Y-m-d") . ".csv");
header("Content-disposition: filename=".$filename.".csv");

echo iconv('UTF-8', 'ISO-8859-1', $csv_output);

-------------------------------------------------------

ORIGINAL QUESTION

Hi,

I've got a form which collects data, form works ok but I've just noticed that if someone types or uses a '£' symbol, the MySQL DB ends up with '£'.

Not really sure where or how to stop this from happening, code and DB information to follow:

MySQL details

mysql> SHOW COLUMNS FROM fraud_report;
+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | mediumint(9) |      | PRI | NULL    | auto_increment |
| crm_number   | varchar(32)  | YES  |     | NULL    |                |
| datacash_ref | varchar(32)  | YES  |     | NULL    |                |
| amount       | varchar(32)  | YES  |     | NULL    |                |
| sales_date   | varchar(32)  | YES  |     | NULL    |                |
| domain       | varchar(32)  | YES  |     | NULL    |                |
| date_added   | datetime     | YES  |     | NULL    |                |
| agent_added  | varchar(32)  | YES  |     | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+
8 rows in set (0.03 sec)

PHP Function

function    processFraudForm($crm_number, $datacash_ref, $amount, $sales_date, $domain, $agent_added) {

    // Insert Data to DB
    $sql    = "INSERT INTO fraud_report (id, crm_number, datacash_ref, amount, sales_date, domain, date_added, agent_added) VALUES (NULL, '$crm_number', '$datacash_ref', '$amount', '$sales_date', '$domain', NOW(), '$agent_added')";
    $result = mysql_query($sql) or die (mysql_error());

    if ($result) {
        $outcome = "<div id=\"success\">Emails sent and database updated.</div>";
    } else {
        $outcome = "<div id=\"error\">Something went wrong!</div>";
    }

    return $outcome;
}

Example DB Entry

+----+------------+--------------+---------+------------+--------------------+---------------------+------------------+
| id | crm_number | datacash_ref | amount  | sales_date | domain             | date_added          | agent_added      |
+----+------------+--------------+---------+------------+--------------------+---------------------+------------------+
| 13 | 100xxxxxxx | 10000000     | £10.93 | 18/12/08   |  blargh.com        | 2008-12-22 10:53:53 | agent.name | 
like image 568
suitedupgeek Avatar asked Dec 22 '08 14:12

suitedupgeek


2 Answers

What you're seeing is UTF-8 encoding - it's a way of storing Unicode characters in a relatively compact format.

The pound symbol has value 0x00a3 in Unicode, but when it's written in UTF-8 that becomes 0xc2 0xa3 and that's what's stored in the database. It seems that your database table is already set to use UTF-8 encoding. This is a good thing!

If you pull the value back out from the database and display it on a UTF-8 compatible terminal (or on a web page that's declared as being UTF-8 encoded) it will look like a normal pound sign again.

like image 186
Alnitak Avatar answered Oct 18 '22 04:10

Alnitak


£ is 0xC2 0xA3 which is the UTF-8 encoding for £ symbol - so you're storing it as UTF-8, but presumably viewing it as Latin-1 or something other than UTF-8

It's useful to know how to spot and decode UTF-8 by hand - check the wikipedia page for info on how the encoding works:

  • 0xC2A3 = 110 00010 10 100011
  • The bold parts are the actual "payload", which gives 10100011, which is 0xA3, the pound symbol.
like image 43
Paul Dixon Avatar answered Oct 18 '22 03:10

Paul Dixon