I've got a design decision to make and am looking for some best practice advice. I have a java program which needs to store a large number (few hundred a day) of floating point arrays in a MySQL database. The data is a fixed length Double
array of length 300. I can see three reasonable options:
I should also mention that this data will be read from and updated frequently.
I want to use a BLOB since that is what I have done in the past and it seems like the most efficient method (e.g., maintains fixed width & no need to convert to a comma separated string). However my coworker is insisting that we should serialize and use varchar for reasons which seem mostly dogmatic.
If one of these methods is better than the other, are the reasons Java or MySQL specific?
The easiest way store array type data in MySQL is to use the JSON data type. The JSON data type was first added in MySQL version 5.7. 8, and you can use the type for storing JSON arrays and objects.
MySQL doesn't have an array data type. This is a fundamental problem in architectures where storing denormalized rows is a requirement, for example, where MySQL is (also) used for data warehousing.
BLOB, which stands for a Binary Large Object, is a MySQL data type that can store images, PDF files, multimedia, and other types of binary data.
A Blob is used to store very large data objects of indeterminate and variable size, such as bit-mapped graphics images, vector drawings, sound files, video segments, chapter or book-length documents, or any other kind of multimedia information.
Is there a reason you don't create a child table so you can store one floating point value per row, instead of an array?
Say you store a thousand arrays of 300 elements each per day. That's 300,000 rows per day, or 109.5 million per year. Nothing to sneeze at, but within the capabilities of MySQL or any other RDBMS.
Re your comments:
Sure, if the order is significant you add another column for the order. Here's how I'd design the table:
CREATE TABLE VectorData (
trial_id INT NOT NULL,
vector_no SMALLINT UNSIGNED NOT NULL,
order_no SMALLINT UNSIGNED NOT NULL,
element FLOAT NOT NULL,
PRIMARY KEY (trial_id, vector_no),
FOREIGN KEY (trial_id) REFERENCES Trials (trial_id)
);
Total space for a row of vector data: 300x(4+2+2+4) = 3600 bytes. Plus InnoDB record directory (internals stuff) of 16 bytes.
Total space if you serialize a Java array of 300 floats = 1227 bytes?
So you save about 2400 bytes, or 67% of the space by storing the array. But suppose you have 100GB of space to store the database. Storing a serialized array allows you to store 87.5 million vectors, whereas the normalized design only allows you to store 29.8 million vectors.
You said you store a few hundred vectors per day, so you'll fill up that 100GB partition in only 81 years instead of 239 years.
Re your comment: Performance of INSERT is an important issue, but you're only storing a few hundred vectors per day.
Most MySQL applications can achieve hundreds or thousands of inserts per second without excessive wizardry.
If you need optimal performance, here are some things to look into:
Search for the phrase "mysql inserts per second" on your favorite search engine to read many articles and blogs talking about this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With