Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimal way to store BLOBs larger than max_allowed_packet in MySQL InnoDB

Tags:

mysql

innodb

blob

Maybe this question should be asked on https://dba.stackexchange.com/ instead, I'm not sure. Please advise in comments or move it there.

For this project I'm using MySQL 5.6.19 hosted at Amazon RDS.

Summary

I'm going to store photos in the database in a BLOB column in InnoDB table and I would like to know the optimal way to do it. I'm looking for official documentation or some method(s) that would allow to compare different variants.

When searching for this topic there are a lot of discussions and questions about whether it is better to store binary files in the database BLOB or in the file system with the database having only file paths and names. Such discussion is beyond the scope of this question. For this project I need consistency and referential integrity, so files are going to be stored in BLOB, the question is in details of how exactly to do it.

Database schema

Here is the relevant part of the schema (so far). There is a table Contracts with some general information about each contract and primary ID key. For each Contract there can be several (~10) photos taken, so I have a table ContractPhotos:

CREATE TABLE `ContractPhotos` (
  `ID` int(11) NOT NULL,
  `ContractID` int(11) NOT NULL,
  `PhotoDateTime` datetime NOT NULL,
  PRIMARY KEY (`ID`),
  KEY `IX_ContractID` (`ContractID`),
  CONSTRAINT `FK_ContractPhotos_Contracts` FOREIGN KEY (`ContractID`) REFERENCES `Contracts` (`ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8

For each photo I will store original full resolution image plus few scaled down versions, so I have a table ContractPhotoVersions:

CREATE TABLE `ContractPhotoVersions` (
  `ID` int(11) NOT NULL,
  `ContractPhotoID` int(11) NOT NULL,
  `PhotoVersionTypeID` int(11) NOT NULL,
  `PhotoWidth` int(11) NOT NULL,
  `PhotoHeight` int(11) NOT NULL,
  `FileSize` int(11) NOT NULL,
  `FileMD5` char(32) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
  PRIMARY KEY (`ID`),
  KEY `IX_ContractPhotoID` (`ContractPhotoID`),
  CONSTRAINT `FK_ContractPhotoVersions_ContractPhotos` FOREIGN KEY (`ContractPhotoID`) REFERENCES `ContractPhotos` (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Finally, there is a table that holds actual binary data of all images. I know that MySQL allows to store up to 4GB in LONGBLOB columns, but during my search I came across another MySQL limitation: max_allowed_packet. On my instance of MySQL this variable is 4MB. My understanding of this variable after reading the docs is that effectively, a single row can't exceed 4MB. It is pretty normal to have a photo that is more than 4MB, so in order to be able to INSERT and SELECT such files I intend to split the file into small chunks:

CREATE TABLE `PhotoChunks` (
  `ID` int(11) NOT NULL,
  `ContractPhotoVersionID` int(11) NOT NULL,
  `ChunkNumber` int(11) NOT NULL,
  `ChunkSize` int(11) NOT NULL,
  `ChunkData` blob NOT NULL,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `IX_ContractPhotoVersionID_ChunkNumber` (`ContractPhotoVersionID`,`ChunkNumber`),
  CONSTRAINT `FK_PhotoChunks_ContractPhotoVersions` FOREIGN KEY (`ContractPhotoVersionID`) REFERENCES `ContractPhotoVersions` (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Besides, I will be able to upload large photos into the database few chunks at a time and resume upload when connection drops.

Data volume

The estimated volume of data is 40,000 full-resolution photos at ~5MB each => 200GB. The scaled down versions will most likely be 800x600 at ~120KB each => + extra 5GB. Images will not be UPDATEd. They will be deleted eventually after several years.

Question

There are many ways to split a file into smaller chunks: you can split it into 4KB, 8KB, 64KB, etc. What would be the optimal way when using InnoDB storage engine to minimize wasted space first and overall performance second?

I found these docs: http://dev.mysql.com/doc/refman/5.6/en/innodb-file-space.html, but there is not much detail about BLOB. It says that page size is 16KB.

The maximum row length, except for variable-length columns (VARBINARY, VARCHAR, BLOB and TEXT), is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes.

I really expected official documentation to be more precise than about 8000 bytes. The following paragraph is most interesting:

If a row is less than half a page long, all of it is stored locally within the page. If it exceeds half a page, variable-length columns are chosen for external off-page storage until the row fits within half a page. For a column chosen for off-page storage, InnoDB stores the first 768 bytes locally in the row, and the rest externally into overflow pages. Each such column has its own list of overflow pages. The 768-byte prefix is accompanied by a 20-byte value that stores the true length of the column and points into the overflow list where the rest of the value is stored.

Considering the above there can be at least these strategies:

  • choose such chunk size that it will be stored locally within the page without involving off-page storage.
  • choose such chunk size that the whole BLOB is stored off-page.
  • I don't like the idea of storing a BLOB partially within the page and partially off-page. But, hey, maybe I'm wrong.

I also came across this doc https://dev.mysql.com/doc/refman/5.6/en/innodb-row-format-dynamic.html and at this point I realised that I want to ask this question. It is too overwhelming for me now and I hope that there is somebody who has had a practical experience with this topic.

I don't want to end up wasting half of the disk space by inadvertently choosing a poor chunk size and row format. My concern is that if I choose to store 8000 bytes for each chunk plus 16 bytes for 4 ints in the same row of PhotoChunks table it would exceed that magic half of the page size and I end up spending 16KB for each row for only 8000 byte of data.

Is there a way to check how much space is actually wasted in this way? In the Amazon RDS environment I'm afraid there is no way to have a look at the actual files that the InnoDB table consists of. Otherwise, I would simply try different variants and see the final file size.

So far I can see that there are two parameters: the row format and chunk size. Maybe there are other things to consider.

Edit

Why I don't consider changing the max_allowed_packet variable. From the doc:

Both the client and the server have their own max_allowed_packet variable, so if you want to handle big packets, you must increase this variable both in the client and in the server.

I use MySQL C API to work with this database and the same C++ application is talking to 200 other MySQL servers (completely unrelated to this project) using same libmysql.dll. Some of these servers are still MySQL 3.23. So my app has to work with all of them. Frankly speaking, I didn't look into docs on how to change max_allowed_packet variable in the client side of MySQL C API.

Edit 2

@akostadinov pointed out that there is mysql_stmt_send_long_data() to send BLOB data to server in chunks and people said that they have managed to INSERT BLOBs that are larger than max_allowed_packet. Still, even if I manage to INSERT, say, 20MB BLOB with max_allowed_packet=4MB how do I SELECT it back? I don't see how I can do it.

I would appreciate it if you pointed me to the right direction.

like image 656
Vladimir Baranov Avatar asked Apr 13 '15 08:04

Vladimir Baranov


People also ask

What is the maximum size BLOB can store?

BLOB: Can handle up to 65,535 bytes of data. MEDIUMBLOB: The maximum length supported is 16,777,215 bytes. LONGBLOB: Stores up to 4,294,967,295 bytes of data.

What is max allowed packet size in MySQL?

The largest possible packet that can be transmitted to or from a MySQL 8.0 server or client is 1GB. When a MySQL client or the mysqld server receives a packet bigger than max_allowed_packet bytes, it issues an ER_NET_PACKET_TOO_LARGE error and closes the connection.

How many characters can BLOB hold?

A BLOB (binary large object) is a varying-length binary string that can be up to 2,147,483,647 characters long.

How are blobs stored in MySQL?

In MySQL, we can use BLOB datatype to store the files. A BLOB is a binary large object that can hold a variable amount of data. We can represent the files in binary format and then store them in our database. The four BLOB types are TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB.


2 Answers

I stand by my answer in forums.mysql.com of 2 years ago. Some further notes:

  • 16M is likely to work for max_allowed_packet, however I have no evidence that it works beyond that.
  • In an application I worked on a several years ago, it seemed that a chunk size of about 50KB was 'optimal'.
  • max_allowed_packet can be set in /etc/my.cnf. But if you don't access to that, you are stuck with its value. You can get it in any(?) version by doing SHOW VARIABLES LIKE 'max_allowed_packet'. (I'm reasonably sure back to 4.0, but not sure about 3.23.) So that could be an upper limit on your chunk size.
  • InnoDB will split big BLOB/TEXT fields into 16KB blocks. Probably each block has some overhead, so you don't get exactly 16KB.
  • Antelope versus Barracuda, and other settings control whether 767 bytes of the BLOB is stored in the record. If none is stored there, there is a 20-byte pointer to the off-block storage.
  • Today, 16MB may seem like a reasonable limit for picture size; tomorrow it won't.
  • If you are running a new enough version of MySQL, innodb_page_size can be raised from 16K to 32K or 64K. (And the ~8000 goes up to ~16000, but not ~32000.)
  • If replication is involved, chunking becomes more important. But there can be some extra tricky business with the 'sequence number' for the chunks. (Ask me if you need to go this direction.)
  • Adding the above comments together, I suggest a chunk size of MIN(64700, max_allowed_packet) bytes as a reasonable compromise, even if you can't control innodb_file_format. Only 1-2% of disk space will be wasted inside this "photos" table (assuming pictures of, say, about 1MB).
  • Compression is useless; JPGs are already compressed.
  • Most of the time is in I/O; second most is in network chatter between client and server. The point here is... C vs PHP will not make much difference when it comes to performance.
  • The ~8000 bytes per record is irrelevant in this discussion. That applies to a table with lots of columns -- they can't add up to more than ~8K. Most of the BLOB will go off-page, leaving only 60-800 bytes per row, hence 15-200 rows per 16KB block (average, after other types of overhead).
  • PARTITION is unlikely to be of any use.
  • Is "chunking a premature optimization"? It is not an "optimization" if you are hitting a brick wall because of max_allowed_packet.
like image 61
Rick James Avatar answered Sep 28 '22 04:09

Rick James


One approach to try is using long send as described here: Is there any way to insert a large value in a mysql DB without changing max_allowed_packet?

Another approach, as you suggest, is to split data into chunks. See one possible approach in this thread: http://forums.mysql.com/read.php?20,601656,601656

Another is, given you set some image max size limit on your user interface, to increase packet size accordingly. Do you allow images larger than 16MB?

If you ask me, I'd avoid implementing chunking as it looks more like a premature optimization instead of letting DB do its own optimizations.

like image 44
akostadinov Avatar answered Sep 28 '22 05:09

akostadinov