Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL SELECT return wrong results

I'm working with MySQL 5.7. I created a table with a virtual column (not stored) of type DATETIME with an index on it. While I was working on it, I noticed that order by was not returning all the data (some data I was expecting at the top was missing). Also the results from MAX and MIN were wrong. After I run

ANALYZE TABLE 
CHECK TABLE
OPTIMIZE TABLE

then the results were correct. I guess there was an issue with the index data, so I have few questions:

  1. When and why this could happen?
  2. Is there a way to prevent this?
  3. among the 3 command I run, which is the correct one to use?

I'm worried that this could happen in the future but I'll not notice.

EDIT:

as requested in the comments I added the table definition:

CREATE TABLE `items` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `user_id` bigint(20) unsigned DEFAULT NULL,
  `image` json DEFAULT NULL,
  `status` json DEFAULT NULL,
  `status_expired` tinyint(1) GENERATED ALWAYS AS (ifnull(json_contains(`status`,'true','$.expired'),false)) VIRTUAL COMMENT 'used for index: it checks if status contains expired=true',
  `lifetime` tinyint(4) NOT NULL,
  `expiration` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,
  `last_update` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `create_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`),
  KEY `expiration` (`status_expired`,`expiration`) USING BTREE,
  CONSTRAINT `ts_competition_item_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `ts_user_core` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1312459 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED

Queries that were returning the wrong results:

SELECT * FROM items ORDER BY expiration DESC;
SELECT max(expiration),min(expiration) FROM items;

Thanks

like image 740
Stefano Giacone Avatar asked Jul 23 '16 16:07

Stefano Giacone


People also ask

What does select return in MySQL?

The SELECT statement is used to select data from a database. The data returned is stored in a result table, called the result-set.

What does select * from return?

The SQL SELECT statement returns a result set of records, from one or more tables. A SELECT statement retrieves zero or more rows from one or more database tables or database views. In most applications, SELECT is the most commonly used data manipulation language (DML) command.

Can we use select * In view?

If you change the structure of any tables in the underlying view, select * may break any applications that rely on the columns being in a specific order etc. It's generally accepted that doing select * anywhere, not just in view definitions, is bad practice.


1 Answers

TLDR;

The trouble is that your data comes from virtual columns materialized via indexes. The check, optimize, analyze operations you are doing forces the indexes to be synced and fixes any errors. That gives you the correct results henceforth. At least until the index gets out of sync again.

Why it may happen

Much of the problems are caused by issues with your table design. Let's start with.

`status_expired` tinyint(1) GENERATED ALWAYS AS (ifnull(json_contains(`status`,'true','$.expired'),false)) VIRTUAL

No doubt this is created to overcome the fact that you cannot directly index a JSON column in mysql. You have created a virtual column and indexed that instead. It's all very well, but this column can hold only one of two values; true or false. Which means it has very poor cadinality. As a result, mysql is unlikely to use this index for anything.

But we can see that you have combined the status_expired column with the expired column when creating the index. Perhaps with the idea of overcoming this poor cardinality mentioned above. But wait...

`expiration` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,

Expiration is another virtual column. This has some repercussions.

When a secondary index is created on a generated virtual column, generated column values are materialized in the records of the index. If the index is a covering index (one that includes all the columns retrieved by a query), generated column values are retrieved from materialized values in the index structure instead of computed “on the fly”.

Ref: https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html#json-column-indirect-index

This is contrary to

VIRTUAL: Column values are not stored, but are evaluated when rows are read, immediately after any BEFORE triggers. A virtual column takes no storage.

Ref: https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html

We create virtual columns based on the sound principal that values generated by simple operations on columns shouldn't be stored to avoid redundancy, but by creating an index on it, we reintroduce redundancy.

Proposed fixes

based on the information provided, you don't really seem to need the status_expired column or even the expired column. An item that's past it's expiry date is expired!

CREATE TABLE `items` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `user_id` bigint(20) unsigned DEFAULT NULL,
  `image` json DEFAULT NULL,
  `status` json DEFAULT NULL,
  `expire_date` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,
  `last_update` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `create_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`),
  KEY `expiration` (`expired_date`) USING BTREE,
  CONSTRAINT `ts_competition_item_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `ts_user_core` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1312459 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED

Simply compare the current date with the expired_date column in the above table when you need to find out which items have expired. The difference here is instead of expired being a calculated item in every query, you calculate the expiry_date once, when you create the record.

This makes your table a lot neater and queries possibly faster

like image 95
e4c5 Avatar answered Oct 03 '22 22:10

e4c5