Recently I think about the best practices with storing historical data in MySQL database. For now, each versionable table has two columns - valid_from
and valid_to
, both DATETIME
type. Records with current data has valid_from
filled with its creation day. When I update this row, I fill valid_to
with update date and add new record with valid_from
the same as valid_to
in previous row - easy stuff. But I know that table will be enormous very quick so fetching data can be very slow.
I'd like to know if you have any practices with storing historical data?
It's a common mistake to worry about "large" tables and performance. If you can use indexes to access your data, it doesn't really matter if you have 1000 of 1000000 records - at least not so as you'd be able to measure. The design you mention is commonly used; it's a great design where time is a key part of the business logic.
For instance, if you want to know what the price of an item was at the point when the client placed the order, being able to search product records where valid_from < order_date and valid_until is either null or > order_date is by far the easiest solution.
This isn't always the case - if you're keeping the data around just for archive purposes, it may make more sense to create archive tables. However, you have to be sure that time is really not part of the business logic, otherwise the pain of searching multiple tables will be significant - imagine having to search either the product table OR the product_archive table every time you want to find out about the price of a product at the point the order was placed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With