Maybe this question should be asked on https://dba.stackexchange.com/ instead, I'm not sure. Please advise in comments or move it there.
For this project I'm using MySQL 5.6.19 hosted at Amazon RDS.
Summary
I'm going to store photos in the database in a BLOB
column in InnoDB
table and I would like to know the optimal way to do it. I'm looking for official documentation or some method(s) that would allow to compare different variants.
When searching for this topic there are a lot of discussions and questions about whether it is better to store binary files in the database BLOB
or in the file system with the database having only file paths and names. Such discussion is beyond the scope of this question. For this project I need consistency and referential integrity, so files are going to be stored in BLOB
, the question is in details of how exactly to do it.
Database schema
Here is the relevant part of the schema (so far). There is a table Contracts
with some general information about each contract and primary ID
key.
For each Contract there can be several (~10) photos taken, so I have a table ContractPhotos
:
CREATE TABLE `ContractPhotos` (
`ID` int(11) NOT NULL,
`ContractID` int(11) NOT NULL,
`PhotoDateTime` datetime NOT NULL,
PRIMARY KEY (`ID`),
KEY `IX_ContractID` (`ContractID`),
CONSTRAINT `FK_ContractPhotos_Contracts` FOREIGN KEY (`ContractID`) REFERENCES `Contracts` (`ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8
For each photo I will store original full resolution image plus few scaled down versions, so I have a table ContractPhotoVersions
:
CREATE TABLE `ContractPhotoVersions` (
`ID` int(11) NOT NULL,
`ContractPhotoID` int(11) NOT NULL,
`PhotoVersionTypeID` int(11) NOT NULL,
`PhotoWidth` int(11) NOT NULL,
`PhotoHeight` int(11) NOT NULL,
`FileSize` int(11) NOT NULL,
`FileMD5` char(32) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
PRIMARY KEY (`ID`),
KEY `IX_ContractPhotoID` (`ContractPhotoID`),
CONSTRAINT `FK_ContractPhotoVersions_ContractPhotos` FOREIGN KEY (`ContractPhotoID`) REFERENCES `ContractPhotos` (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Finally, there is a table that holds actual binary data of all images. I know that MySQL allows to store up to 4GB in LONGBLOB
columns, but during my search I came across another MySQL limitation: max_allowed_packet. On my instance of MySQL this variable is 4MB. My understanding of this variable after reading the docs is that effectively, a single row can't exceed 4MB. It is pretty normal to have a photo that is more than 4MB, so in order to be able to INSERT
and SELECT
such files I intend to split the file into small chunks:
CREATE TABLE `PhotoChunks` (
`ID` int(11) NOT NULL,
`ContractPhotoVersionID` int(11) NOT NULL,
`ChunkNumber` int(11) NOT NULL,
`ChunkSize` int(11) NOT NULL,
`ChunkData` blob NOT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `IX_ContractPhotoVersionID_ChunkNumber` (`ContractPhotoVersionID`,`ChunkNumber`),
CONSTRAINT `FK_PhotoChunks_ContractPhotoVersions` FOREIGN KEY (`ContractPhotoVersionID`) REFERENCES `ContractPhotoVersions` (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Besides, I will be able to upload large photos into the database few chunks at a time and resume upload when connection drops.
Data volume
The estimated volume of data is 40,000 full-resolution photos at ~5MB each => 200GB. The scaled down versions will most likely be 800x600 at ~120KB each => + extra 5GB. Images will not be UPDATE
d. They will be deleted eventually after several years.
Question
There are many ways to split a file into smaller chunks: you can split it into 4KB, 8KB, 64KB, etc. What would be the optimal way when using InnoDB storage engine to minimize wasted space first and overall performance second?
I found these docs: http://dev.mysql.com/doc/refman/5.6/en/innodb-file-space.html, but there is not much detail about BLOB. It says that page size is 16KB.
The maximum row length, except for variable-length columns (VARBINARY, VARCHAR, BLOB and TEXT), is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes.
I really expected official documentation to be more precise than about 8000 bytes. The following paragraph is most interesting:
If a row is less than half a page long, all of it is stored locally within the page. If it exceeds half a page, variable-length columns are chosen for external off-page storage until the row fits within half a page. For a column chosen for off-page storage, InnoDB stores the first 768 bytes locally in the row, and the rest externally into overflow pages. Each such column has its own list of overflow pages. The 768-byte prefix is accompanied by a 20-byte value that stores the true length of the column and points into the overflow list where the rest of the value is stored.
Considering the above there can be at least these strategies:
I also came across this doc https://dev.mysql.com/doc/refman/5.6/en/innodb-row-format-dynamic.html and at this point I realised that I want to ask this question. It is too overwhelming for me now and I hope that there is somebody who has had a practical experience with this topic.
I don't want to end up wasting half of the disk space by inadvertently choosing a poor chunk size and row format. My concern is that if I choose to store 8000 bytes for each chunk plus 16 bytes for 4 ints in the same row of PhotoChunks
table it would exceed that magic half of the page size and I end up spending 16KB for each row for only 8000 byte of data.
Is there a way to check how much space is actually wasted in this way? In the Amazon RDS environment I'm afraid there is no way to have a look at the actual files that the InnoDB table consists of. Otherwise, I would simply try different variants and see the final file size.
So far I can see that there are two parameters: the row format and chunk size. Maybe there are other things to consider.
Edit
Why I don't consider changing the max_allowed_packet
variable. From the doc:
Both the client and the server have their own max_allowed_packet variable, so if you want to handle big packets, you must increase this variable both in the client and in the server.
I use MySQL C API to work with this database and the same C++ application is talking to 200 other MySQL servers (completely unrelated to this project) using same libmysql.dll
. Some of these servers are still MySQL 3.23. So my app has to work with all of them. Frankly speaking, I didn't look into docs on how to change max_allowed_packet
variable in the client side of MySQL C API.
Edit 2
@akostadinov pointed out that there is mysql_stmt_send_long_data()
to send BLOB data to server in chunks and people said that they have managed to INSERT
BLOBs that are larger than max_allowed_packet
. Still, even if I manage to INSERT
, say, 20MB BLOB with max_allowed_packet
=4MB how do I SELECT
it back? I don't see how I can do it.
I would appreciate it if you pointed me to the right direction.
BLOB: Can handle up to 65,535 bytes of data. MEDIUMBLOB: The maximum length supported is 16,777,215 bytes. LONGBLOB: Stores up to 4,294,967,295 bytes of data.
The largest possible packet that can be transmitted to or from a MySQL 8.0 server or client is 1GB. When a MySQL client or the mysqld server receives a packet bigger than max_allowed_packet bytes, it issues an ER_NET_PACKET_TOO_LARGE error and closes the connection.
A BLOB (binary large object) is a varying-length binary string that can be up to 2,147,483,647 characters long.
In MySQL, we can use BLOB datatype to store the files. A BLOB is a binary large object that can hold a variable amount of data. We can represent the files in binary format and then store them in our database. The four BLOB types are TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB.
I stand by my answer in forums.mysql.com of 2 years ago. Some further notes:
max_allowed_packet
, however I have no evidence that it works beyond that.SHOW VARIABLES LIKE 'max_allowed_packet'
. (I'm reasonably sure back to 4.0, but not sure about 3.23.) So that could be an upper limit on your chunk size.innodb_page_size
can be raised from 16K to 32K or 64K. (And the ~8000 goes up to ~16000, but not ~32000.)innodb_file_format
. Only 1-2% of disk space will be wasted inside this "photos" table (assuming pictures of, say, about 1MB).PARTITION
is unlikely to be of any use.max_allowed_packet
.One approach to try is using long send as described here: Is there any way to insert a large value in a mysql DB without changing max_allowed_packet?
Another approach, as you suggest, is to split data into chunks. See one possible approach in this thread: http://forums.mysql.com/read.php?20,601656,601656
Another is, given you set some image max size limit on your user interface, to increase packet size accordingly. Do you allow images larger than 16MB?
If you ask me, I'd avoid implementing chunking as it looks more like a premature optimization instead of letting DB do its own optimizations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With