I have a big table in mysql.It has 13 million rows.
Mysql version is 5.7.10.
Table structure as below:
create table table_name (
user_id varchar(20) not null,
item_id varchar(20) not null
);
1. The first sql is:
select count(distinct user_id) from table;
result:760,000
2. The second sql is:
select count(1) from (select user_id from table group by user_id) a;
result:120,000
user_id is not null for each row.
And, the right number is 120,000.Why the first sql get the wrong number?
Then,I run the first sql in hive and spark-sql, the result is 120,000.
So, is this a mysql's bug or something can be setting to make things right?
Thank you!
Update:I try it on another PC, the result of first sql is 120,000.This time get the right number.Mysql version is 5.6.26. So, maybe it is a bug of 5.7.10.
There are multiple known bugs in MySQL count distinct when a column is included in two-column unique key.
here and here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With