I have a question about the blob
data type in MySQL.
I read that the data type can be used to store files. I also read that an alternative is to store the file on disk and include a pointer to its location in the database (via a varchar column).
But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents. So is that any different than storing a pointer to a file on the file system?
In summary, the difference between the two storage services is that Azure Blob Storage is a store for objects capable of storing large amounts of unstructured data. On the other hand, Azure File Storage is a distributed, cloud-based file system.
BLOB stands for Binary Large Objects and as its name suggests, it can be used for storing binary data while TEXT is used for storing large number of strings. BLOB can be used to store binary data that means we can store pictures, videos, sounds and programs also.
BLOB stands for a “Binary Large Object,” a data type that stores binary data. Binary Large Objects (BLOBs) can be complex files like images or videos, unlike other data strings that only store letters and numbers. A BLOB will hold multimedia objects to add to a database; however, not all databases support BLOB storage.
I read that the data type can be used to store files.
According to MySQL manual page on Blob, A BLOB
is a binary large object that can hold a variable amount of data.
Since it's a data type specific to store binary data it's common to use it to store files in binary format, being storing image files a very common use on web applications.
For web applications this would mean that you would first need to convert your file into binary format and then store it, and every time you need to retrieve your file you would need to do the reverse process of converting them back to it's original format.
Besides that, storing large amount of data in your db MAY slow it down. Specially in systems that are not dedicated only to host a database.
I also read that an alternative is to store the file on disk and include a pointer to its location in the database
Bearing in mind all above considerations a common practice for web applications is to store your files elsewhere than your MySQL and then simply store it's path on your database. This approach MAY speed up your database when dealing with large amount of data.
But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents.
In fact that would depend on what storage engine you are using since every engine treats data and stores it in different ways. For the InnoDB engine, which is suited for relational database you may want to read this article from MySQL Performance blog on how the blob is stored in MySQL.
But in abstract, on MySQL 5 and forward the blob is stored as following:
Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them.
So you are probably thinking now that the right way to go is to store them as separate file, but there are some advantages of using blob to store data, the first one (in my opinion) is the backup. I manage a small server and I had to create another subroutine only to copy my files stored as paths to another storage disk (We couldn't afford to buy a decent tape backup system). If I had designed my application to use blobs a simple mysqldump
would be everything that I needed to backup my whole database.
The advantage of storing blobs for backups are better discussed on this post where the person who answered had a similar problem than mine.
Another advantage is security and the easiness of managing permission and access. All the data inside your MySQL server is password protected and you can easily manage permissions for your users about who access what and who doesn't.
In a application which relies on MySQL privileges system for authentication and use. It's certain a plus since it would be a little harder for let's say an invader to retrieve an image (or a binary file like a zipped one) from your disk or an user without access privileges to access it.
So I'd say that
If you gonna manage your MySQL and all the data you have in it and must do regular backups or intend to change or even consider a future change of OS, and have a decent hardware and optimized your MySQL to it, go for BLOB.
If you will not manage your MySQL (as in a web host for example) and doesn't intend to change OS or make backups, stick with varchar
columns pointing to your files.
I hope it helped. Cheers
If you store data is BLOB field, you are making it part of your object abstraction.
BLOB advantages:
Should you want to remove row with BLOB, or remove it as part of master/slave table relationship or maybe the whole table hierarchy, your BLOB is handled automatically and has same lifetime as any other object in database.
Your scripts do not have a need to access anything but database to get everything they require. In many situations, having direct file access open whole can of worms on how to bypass access or security restrictions. For example, with file access, they may have to mount filesystems which contain actual files. But with BLOB in database, you only have to be able to connect to database, no matter where you are.
If you store it in file and file is replaced, removed or no longer accessible, your database would never know - in effect, you cannot guarantee integrity. Also, it is difficult to reliably support multiple versions when using files. If you use and depend on transactions, it becomes almost impossible.
File advantages:
Some databases handle BLOBs rather poorly. For example, while official BLOB limit in MySQL is 4GB, but in reality it is only 1MB in default configuration. You can increase this to 16-32MB by tweaking both client and server configuration to increase MySQL command buffer, but this has a lot of other implications in terms of performance and security.
Even if database does not have some weird size limits, it always will have some overhead in storing BLOB compared to just a file. Also, if BLOB is large, some databases do not provide interface to access blob piece by piece, or stream
it, which can be large impediment for your workflow.
In the end, it is up to you. I typically try to keep it in BLOB, unless this creates unreasonable performance problems.
Yes, MySQL blobs that don't fit within the same page as a row get stored on overflow pages Note that some blobs are small enough that they're stored with the rest of the row, like any other column. The blob pages are not adjacent to the page their row is stored on, so they may result in extra I/O to read them.
On the other hand, just like with any other page type, blob pages can occupy memory in the InnoDB buffer pool, so reading the blobs subsequently is very fast even if they are on separate pages. Files can be cached by the operating system, but typically they're read from disk.
Here are a few other factors that may affect your decision:
Blobs are stored logically with a row. This means if you DELETE the row, the associated blob is deleted automatically. But if you store the blob outside the database, you end up with orphaned blob files after you delete rows from the database. You have to do manual steps to find and delete these files.
Blobs stored in the row also follow transaction semantics. For instance, a new blob or an updated blob is invisible to other transactions until you commit. You can also roll back a change. Storing blobs in files outside the database makes this a lot harder.
When you back up a database containing blobs, the database is a lot bigger of course, but when you backup, you get all the data and associated blobs in one step. If you store blobs externally, you have to back up the database and also back up the filesystem where you store blob files. If you need to ensure that the data and blobs are captured from one instant in time, you pretty much need to use some kind of filesystem snapshots.
If you use replication, the only automatic way of ensuring the blobs get copied to the replication slave automatically is to store blobs in the database.
Filesystem access will be faster than through the database. Blobs columns have some disadvantages in terms of indexing/sorting etc, which you could do with your filename column if you wished to in the future.
The database can also grow quickly with large blobs and then tasks like backing up become slower. I would go with a file location in database with the physical storage on the file system.
The better approach is to store your file in the filesystem folder and point to their paths through a varchar field in the database. One of the drawbacks of saving files in the database is slowing it or reducing its performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With