Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing base64 encoded data as BLOB or TEXT datatype

We have a MySQL InnoDB table holding ~10 columns of small base64 encoded javascript files and png (<2KB size) images base64 encoded as well.

There are few inserts and a lot of reads comparatively, however the output is being cached on a Memcached instance for some minutes to avoid subsequent reads.

As it is right now we are using BLOB for those columns, but I am wondering if there is an advantage in switching to TEXT datatype in terms of performance or snapshot backing up.

My search digging indicates that BLOB and TEXT for my case are close to identical and since I do not know before-hand what type of data are actually going to be stored I went for BLOB.

Do you have any pointers on the TEXT vs BLOB debate for this specific case?

like image 780
Panagiotis Moustafellos Avatar asked Dec 26 '12 15:12

Panagiotis Moustafellos


People also ask

Can we store Base64 in blob?

BLOBs stored in Spreadsheet Data Model files MUST be encoded by using base64 encoding prior to any other compression and storage.

Which is better blob or Base64?

Base64 is almost exactly 8/6 times as bulky as binary (BLOB). One could argue that it is easily affordable. 3000 bytes becomes about 4000 bytes . Everyone should be able to accept arbitrary 8-bit codes, but not everybody does.

Is it good to store Base64 image in database?

While base64 is fine for transport, do not store your images base64 encoded. Base64 provides no checksum or anything of any value for storage. Base64 encoding increases the storage requirement by 33% over a raw binary format.

Is Base64 a data type?

Base64 is a group of binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. By consisting only of ASCII characters, base64 strings are generally url-safe, and that's why they can be used to encode data in Data URLs.


1 Answers

One shouldn't store Base64-encoded data in one's database...

Base64 is a coding in which arbitrary binary data is represented using only printable text characters: it was designed for situations where such binary data needs to be transferred across a protocol or medium that can handle only printable-text (e.g. SMTP/email). It increases the data size (by 33%) and adds the computational cost of encoding/decoding, so it should be avoided unless absolutely necessary.

By contrast, the whole point of BLOB columns is that they store opaque binary strings. So just go ahead and store your stuff directly into your BLOB columns without first Base64-encoding them. (That said, if MySQL has a more suitable type for the particular data being stored, you may wish to use that instead: for example, text files like JavaScript sources could benefit from being stored in TEXT columns for which MySQL natively tracks text-specific metadata—more on this below).

The (erroneous) idea that SQL databases require printable-text encodings like Base64 for handling arbitrary binary data has been perpetuated by a large number of ill-informed tutorials. This idea appears to be seated in the mistaken belief that, because SQL comprises only printable-text in other contexts, it must surely require it for binary data too (at least for data transfer, if not for data storage). This is simply not true: SQL can convey binary data in a number of ways, including plain string literals (provided that they are properly quoted and escaped like any other string); of course, the preferred way to pass data (of any type) to your database is through parameterised queries, and the data types of your parameters can just as easily be raw binary strings as anything else.

...unless it's cached for performance reasons...

The only situation in which there might be some benefit from storing Base64-encoded data is where it's usually transmitted across a protocol requiring such encoding (e.g. by email attachment) immediately after being retrieved from the database—in which case, storing the Base64-encoded representation would save from having to perform the encoding operation on the otherwise raw data upon every fetch.

However, note in this sense that the Base64-encoded storage is merely acting as a cache, much like one might store denormalised data for performance reasons.

...in which case it should be TEXT not BLOB

As alluded above: the only difference between TEXT and BLOB columns is that, for TEXT columns, MySQL additionally tracks text-specific metadata (such as character encoding and collation) for you. This additional metadata enables MySQL to convert values between storage and connection character sets (where appropriate) and perform fancy string comparison/sorting operations.

Generally speaking: if two clients working in different character sets should see the same bytes, then you want a BLOB column; if they should see the same characters then you want a TEXT column.

With Base64, those two clients must ultimately find that the data decodes to the same bytes; but they should see that the stored/encoded data has the same characters. For example, suppose one wishes to insert the Base64-encoding of 'Hello world!' (which is 'SGVsbG8gd29ybGQh'). If the inserting application is working in the UTF-8 character set, then it will send the byte sequence 0x53475673624738676432397962475168 to the database.

  • if that byte sequence is stored in a BLOB column and later retrieved by an application that is working in UTF-16*, the same bytes will be returned—which represent '升噳扇㡧搲㥹扇全' and not the desired Base64-encoded value; whereas

  • if that byte sequence is stored in a TEXT column and later retrieved by an application that is working in UTF-16, MySQL will transcode on-the-fly to return the byte sequence 0x0053004700560073006200470038006700640032003900790062004700510068—which represents the original Base64-encoded value 'SGVsbG8gd29ybGQh' as desired.

Of course, you could nevertheless use BLOB columns and track the character encoding in some other way—but that would just needlessly reinvent the wheel, with added maintenance complexity and risk of introducing unintentional errors.


* Actually MySQL doesn't support using client character sets that are not byte-compatible with ASCII (and therefore Base64 encodings will always be consistent across any combination of them), but this example nevertheless serves to illustrate the difference between BLOB and TEXT column types and thus explains why TEXT is technically correct for this purpose even though BLOB will actually work without error (at least until MySQL adds support for non-ASCII compatible client character sets).

like image 93
eggyal Avatar answered Oct 20 '22 10:10

eggyal