Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exported blob data different than DB data

I'm attempting to export our very large MySQL database (1.6GB - mostly BLOB) and import into a new server. I have worked through most of the problems and finally completed the import without any errors. Using MySQL Query Browser I ran a query on a table with BLOBs of images and saved one to disk (using the save icon in query browser). When I tried to open the file I received an "invalid image format" error. Uh oh.

Using query browser I inspected the value on the source database and the new, recently imported database. The values are different, I think. It could just be encoding issues or something. Here is what I see:

Source (valid data) server:

FF D8 FF E0    00 10 4A 46    49 46 00 01    01 01 00 60
00 60 00 00    FF DB 00 43    00 08 06 06    07 06 05 08
and so on...

New server:

C3 BF C3 98    C3 BF C3 A0    00 10 4A 46    49 46 00 01
01 01 00 60    00 60 00 00    C3 BF C3 9B    00 43 00 08
and so on...

In this example it appears to my newbie eye that there are 3 bytes of extra data in the front of the data on the "new" server.

I then checked out the sql dump file using 010 editor. I found the line for this particular example and here is what I see:

FF D8 FF E0    5C 30 10 4A    46 49 46 5C    30 01 01 01
5C 30 60 5C    30 60 5C 30    5C 30 FF DB    5C 30 43 5C
30 08 06 06    07 06 05 08    and so on...

Now I'm way over my head. I see the pattern, I do notice that the HEX pair 5C 30 appears to be the same as 00 but I don't understand WHY. At this point I have a source server that is about to be wiped and a new one that I'm afraid has corrupt data imported into it. I'm hoping this is some kind of encoding issue that can be solved by setting a global variable in MySQL but I really don't know.

I should also mention that when I save the files from the source (working) server and the new (corrupt) server the files size is about 40% larger for the corrupt file.

I checked the character set variables on both servers:

SHOW VARIABLES LIKE '%char%'

source server:

character_set_client      utf8
character_set_connection  utf8
character_set_database    latin1
character_set_filesystem  binary
character_set_results     utf8
character_set_server      latin1
character_set_system      utf8

new server:

character_set_client      utf8
character_set_connection  utf8
character_set_database    latin1
character_set_filesystem  binary
character_set_results     utf8
character_set_server      latin1
character_set_system      utf8

They are the same.

like image 917
scubasteve Avatar asked Nov 30 '12 18:11

scubasteve


1 Answers

The corrupted data from the new database looks like the result of converting the source data from ISO‑8859‑1 to UTF‑8 (e.g. U+00FF — ÿ — is FF in the former and C3 BF in the latter).

Since BLOBs have no character set, the character encoding is not controlled by server variables; I suspect mysqldump is outputting your BLOB data in a UTF-8–encoded file (which is the default), and it is encoded along the way somehow, through some combination of server settings and options passed to mysqldump.

The best solution may be to use the --hex-blob option when exporting BLOB fields, which would result in something like:

INSERT INTO `table` VALUES (0xFFD8FFE0...);
like image 119
cmbuckley Avatar answered Oct 05 '22 03:10

cmbuckley