I'm working with MySQL 5.7. I created a table with a virtual column (not stored) of type DATETIME with an index on it. While I was working on it, I noticed that order by was not returning all the data (some data I was expecting at the top was missing). Also the results from MAX and MIN were wrong. After I run
ANALYZE TABLE
CHECK TABLE
OPTIMIZE TABLE
then the results were correct. I guess there was an issue with the index data, so I have few questions:
I'm worried that this could happen in the future but I'll not notice.
EDIT:
as requested in the comments I added the table definition:
CREATE TABLE `items` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned DEFAULT NULL,
`image` json DEFAULT NULL,
`status` json DEFAULT NULL,
`status_expired` tinyint(1) GENERATED ALWAYS AS (ifnull(json_contains(`status`,'true','$.expired'),false)) VIRTUAL COMMENT 'used for index: it checks if status contains expired=true',
`lifetime` tinyint(4) NOT NULL,
`expiration` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,
`last_update` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`create_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `expiration` (`status_expired`,`expiration`) USING BTREE,
CONSTRAINT `ts_competition_item_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `ts_user_core` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1312459 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED
Queries that were returning the wrong results:
SELECT * FROM items ORDER BY expiration DESC;
SELECT max(expiration),min(expiration) FROM items;
Thanks
The SELECT statement is used to select data from a database. The data returned is stored in a result table, called the result-set.
The SQL SELECT statement returns a result set of records, from one or more tables. A SELECT statement retrieves zero or more rows from one or more database tables or database views. In most applications, SELECT is the most commonly used data manipulation language (DML) command.
If you change the structure of any tables in the underlying view, select * may break any applications that rely on the columns being in a specific order etc. It's generally accepted that doing select * anywhere, not just in view definitions, is bad practice.
The trouble is that your data comes from virtual columns materialized via indexes. The check, optimize, analyze operations you are doing forces the indexes to be synced and fixes any errors. That gives you the correct results henceforth. At least until the index gets out of sync again.
Much of the problems are caused by issues with your table design. Let's start with.
`status_expired` tinyint(1) GENERATED ALWAYS AS (ifnull(json_contains(`status`,'true','$.expired'),false)) VIRTUAL
No doubt this is created to overcome the fact that you cannot directly index a JSON
column in mysql. You have created a virtual column and indexed that instead. It's all very well, but this column can hold only one of two values; true
or false
. Which means it has very poor cadinality. As a result, mysql is unlikely to use this index for anything.
But we can see that you have combined the status_expired
column with the expired
column when creating the index. Perhaps with the idea of overcoming this poor cardinality mentioned above. But wait...
`expiration` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,
Expiration is another virtual column. This has some repercussions.
When a secondary index is created on a generated virtual column, generated column values are materialized in the records of the index. If the index is a covering index (one that includes all the columns retrieved by a query), generated column values are retrieved from materialized values in the index structure instead of computed “on the fly”.
Ref: https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html#json-column-indirect-index
This is contrary to
VIRTUAL: Column values are not stored, but are evaluated when rows are read, immediately after any BEFORE triggers. A virtual column takes no storage.
Ref: https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html
We create virtual columns based on the sound principal that values generated by simple operations on columns shouldn't be stored to avoid redundancy, but by creating an index on it, we reintroduce redundancy.
based on the information provided, you don't really seem to need the status_expired
column or even the expired
column. An item that's past it's expiry date is expired!
CREATE TABLE `items` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned DEFAULT NULL,
`image` json DEFAULT NULL,
`status` json DEFAULT NULL,
`expire_date` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,
`last_update` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`create_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `expiration` (`expired_date`) USING BTREE,
CONSTRAINT `ts_competition_item_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `ts_user_core` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1312459 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED
Simply compare the current date with the expired_date column in the above table when you need to find out which items have expired. The difference here is instead of expired
being a calculated item in every query, you calculate the expiry_date
once, when you create the record.
This makes your table a lot neater and queries possibly faster
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With