Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a BLOB converted using the current/default charset in MySQL?

  1. I have a table with a BLOB field.
  2. The charset of the table is Latin1.
  3. I connect to the DB and "SET CHARACTER SET utf8".
  4. Then I save binary data into the field.
  5. Then I retrieve the data, and it's not what I saved (corrupt).

The code:

<?php
$pdo = new \PDO("mysql:host=127.0.0.1;dbname=***", '***', '***');

$pdo->exec('SET CHARACTER SET utf8');

$sql = "INSERT INTO pdo_blob (the_blob) VALUES(:the_blob)";
$insertStm = $pdo->prepare($sql);

$blob = (binary) file_get_contents('/home/***/test.pdf');
$insertStm->bindParam(":the_blob", $blob, \PDO::PARAM_LOB);
$insertStm->execute();

$selectStm = $pdo->prepare("SELECT the_blob FROM pdo_blob ORDER BY id DESC LIMIT 1");
$selectStm->execute();

$savedBlob = null;
$selectStm->bindColumn(1, $savedBlob, \PDO::PARAM_LOB);
$selectStm->fetch();

echo 'equal: ' . ((int) ($blob == $savedBlob));
like image 499
user151851 Avatar asked Feb 06 '13 17:02

user151851


People also ask

How is BLOB stored in MySQL?

In MySQL, we can use BLOB datatype to store the files. A BLOB is a binary large object that can hold a variable amount of data. We can represent the files in binary format and then store them in our database. The four BLOB types are TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB.

Is BLOB a data type in MySQL?

BLOB, which stands for a Binary Large Object, is a MySQL data type that can store images, PDF files, multimedia, and other types of binary data.

What is default character set in MySQL?

The default MySQL server character set and collation are latin1 and latin1_swedish_ci , but you can specify character sets at the server, database, table, column, and string literal levels.

What is the datatype for BLOB?

A BLOB (binary large object) is a varying-length binary string that can be up to 2,147,483,647 characters long. Like other binary types, BLOB strings are not associated with a code page. In addition, BLOB strings do not hold character data.


1 Answers

Short Answer:

Simply delete or comment out line below, and it will always work, no matter which database encoding is really in use (utf8, latin1, etc):

$pdo->exec('SET CHARACTER SET utf8');

Long Answer:

This is not PDO bug, this is MySQL bug.

When actual database encoding is latin1, but you use:

SET CHARACTER SET utf8

(or vice versa: actual is utf8, but you use latin1 - important part is that it is different), then, as far as I can tell, MySQL will try to perform charset conversion for all traffic between client and server (even for BLOB!).

If you DO NOT use SET CHARACTER SET statement, from what I see for scripts (PHP/PDO or Perl/DBI) connection charset by default is set to be the database charset, and in that case no implicit conversion takes place.

Obviously, this automatic conversion is what kills BLOBs, which do not want any conversion to happen.

I have tested this on both PHP/PDO and Perl/DBI, and issue is easily reproducible: both will fail if using database with latin1 encoding and using SET CHARACTER SET utf8 (or vice versa).

If you want to be fully UTF8 compatible, you should change encoding of your database using:

ALTER DATABASE mydb CHARSET utf8;

With this, everything will be using UTF8, and BLOBs will also work fine.

Minimal file that causes this corruption problem is blob.bin with single byte 0xFF. On Linux, you can create this test file using printf command:

printf "0xFF" > blob.bin

Now, test scripts that reproduce the problem:

PHP test code:

<?php
$dbh = new PDO("mysql:host=127.0.0.1;dbname=test");
# If database encoding is NOT utf8, uncomment to break it:
# $dbh->exec("SET CHARACTER SET utf8");

$blob1 = file_get_contents("blob.bin");
$sth = $dbh->prepare(
    "INSERT INTO pdo_blob (the_blob) VALUES(:the_blob)"
);
$sth->bindParam(":the_blob", $blob1, PDO::PARAM_LOB);
$sth->execute();

$sth = $dbh->prepare(
    "SELECT the_blob FROM pdo_blob ORDER BY id DESC LIMIT 1"
);
$sth->execute();

$blob2 = null;
$sth->bindColumn(1, $blob2, PDO::PARAM_LOB);
$sth->fetch();

if ($blob1 == $blob2) {
    echo "Equal\n";
} else {
    echo "Not equal\n";
    $arr1 = str_split($blob1);
    $arr2 = str_split($blob2);
    $i=0;
    for ($i=0; $i<count($arr1); $i++) {
        if ($arr1[$i] != $arr2[$i]) {
            echo "First diff: " . dechex(ord($arr1[$i])) . " != "
                                . dechex(ord($arr2[$i])) . "\n";
            break;
        }
    }
}
?>

Perl test code:

#!/usr/bin/perl -w

use strict;
use DBI qw(:sql_types);

my $dbh = DBI->connect("dbi:mysql:host=127.0.0.1;dbname=test");
# If database encoding is NOT utf8, uncomment to break it:
# $dbh->do("SET CHARACTER SET utf8");
open FILE, "blob.bin";
binmode FILE;
read(FILE, my $blob1, 100000000);
close FILE;
my $sth = $dbh->prepare(
    "INSERT INTO pdo_blob (the_blob) VALUES(?)"
);
$sth->bind_param(1, $blob1, SQL_BLOB);
$sth->execute();
my ($blob2) = $dbh->selectrow_array(
    "SELECT the_blob FROM pdo_blob ORDER BY id DESC LIMIT 1"
);
print ($blob1 eq $blob2 ? "Equal" : "Not equal") , "\n";
like image 198
mvp Avatar answered Sep 28 '22 06:09

mvp