MySQL distinct performance

Tags:

When I add 'distinct' to my query, query time increases from 0.015 to over 6 seconds.

I want to join several tables, which are linked via foreign keys and get a distinct column from it:

select distinct table3.idtable3 from 
    table1
    join table2 on table1.idtable1 = table2.fkey
    join table3 on table2.idtable2 = table3.fkey
    where table1.idtable1 = 1

The distinct query takes 6 seconds which seems to me to be improvable.

With select:

duration : 0.015s / fetch:5.532s (5.760.434 rows)

Explain:

Click to copy

id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1   SIMPLE  table1      index   asd asd 137     10  10.00   Using where; Using index
1   SIMPLE  table2      ALL idtable2                200 25.00   Using where; Using join buffer (Block Nested Loop)
1   SIMPLE  table3      ref fkey_table2_table_3_idx fkey_table2_table_3_idx 138 mydb.table2.idtable2    66641   100.00

With distinct select:

duration : 6.625s / fetch:0.000s (1000 rows)

Explain:

Click to copy

id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1   SIMPLE  table1      index   asd asd 137     10  10.00   Using where; Using index; Using temporary
1   SIMPLE  table2      ALL idtable2                200 25.00   Using where; Using join buffer (Block Nested Loop)
1   SIMPLE  table3      ref fkey_table2_table_3_idx fkey_table2_table_3_idx 138 mydb.table2.idtable2    66641   100.00

Database: Database snippet

Code for testing / MCRE:

Click to copy

import mysql.connector
import time
import numpy as np




""" 
-- MySQL Script generated by MySQL Workbench
-- Fri Jan 17 12:19:26 2020
-- Model: New Model    Version: 1.0
-- MySQL Workbench Forward Engineering

SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION';

-- -----------------------------------------------------
-- Schema mydb
-- -----------------------------------------------------

-- -----------------------------------------------------
-- Schema mydb
-- -----------------------------------------------------
CREATE SCHEMA IF NOT EXISTS `mydb` DEFAULT CHARACTER SET utf8 ;
USE `mydb` ;

-- -----------------------------------------------------
-- Table `mydb`.`table1`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `mydb`.`table1` (
  `idtable1` VARCHAR(45) NOT NULL,
  INDEX `asd` (`idtable1` ASC) VISIBLE)
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `mydb`.`table2`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `mydb`.`table2` (
  `idtable2` VARCHAR(45) NOT NULL,
  `fkey` VARCHAR(45) NULL,
  INDEX `link_table1_table2_idx` (`fkey` ASC) INVISIBLE,
  INDEX `idtable2` (`idtable2` ASC) VISIBLE,
  CONSTRAINT `link_table1_table2`
    FOREIGN KEY (`fkey`)
    REFERENCES `mydb`.`table1` (`idtable1`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `mydb`.`table3`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `mydb`.`table3` (
  `idtable3` VARCHAR(45) NOT NULL,
  `fkey` VARCHAR(45) NULL,
  INDEX `fkey_table2_table_3_idx` (`fkey` ASC) VISIBLE,
  CONSTRAINT `fkey_table2_table_3`
    FOREIGN KEY (`fkey`)
    REFERENCES `mydb`.`table2` (`idtable2`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;


SET SQL_MODE=@OLD_SQL_MODE;
SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS;
SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS;


"""


def insertData():
    for i in range(2):
        num_distinct_table1_values = 5
        num_distinct_table2_values = 10
        num_distinct_table3_values = 1000

        num_entries_table1 = int(num_distinct_table1_values)
        num_entries_table2 = int(num_distinct_table2_values * 10)
        num_entries_table3 = int(num_distinct_table3_values * 300)

        random_numbers_table1_id = range(num_distinct_table1_values)

        random_numbers_table2_id = np.random.randint(num_distinct_table2_values, size=int(num_entries_table2))
        random_numbers_table2_fkey = np.random.randint(num_distinct_table1_values, size=int(num_entries_table2))

        random_numbers_table3_id = np.random.randint(num_distinct_table3_values, size=int(num_entries_table3))
        random_numbers_table3_fkey = np.random.randint(num_distinct_table2_values, size=int(num_entries_table3))

        value_string_table1 = ','.join([f"('{i_name}')" for i_name in random_numbers_table1_id])
        value_string_table2=""
        for i in range(num_entries_table2):
            value_string_table2 = value_string_table2+','.join(
                ["('{id}','{fkey}'),".format(id=random_numbers_table2_id[i], fkey=random_numbers_table2_fkey[i])])

        value_string_table3=""
        for i in range(num_entries_table3):
            value_string_table3 = value_string_table3+','.join(
                ["('{id}','{fkey}'),".format(id=random_numbers_table3_id[i], fkey=random_numbers_table3_fkey[i])])

        # fill table 1
        mySql_insert_query = f"INSERT INTO table1 (idtable1) VALUES {value_string_table1}"
        cursor.execute(mySql_insert_query)
        conn.commit()
        print("Done table 1")
        # fill table 2
        mySql_insert_query = f"INSERT INTO table2 (idtable2, fkey) VALUES {value_string_table2}"
        mySql_insert_query=mySql_insert_query[0:-1]
        cursor.execute(mySql_insert_query)
        print("Done table 2")
        # fill table 3
        mySql_insert_query = f"INSERT INTO table3 (idtable3, fkey) VALUES {value_string_table3}"
        mySql_insert_query = mySql_insert_query[0:- 1]
        cursor.execute(mySql_insert_query)
        print("Done table 3")

        conn.commit()

conn = mysql.connector.connect(user='root', password='admin', host='127.0.0.1',
                               database='mydb', raise_on_warnings=True, autocommit=False)
cursor = conn.cursor()


insertData()


conn.close()

842

asked Jan 17 '20 11:01

Langer

1 Answers

Thanks for the CREATE TABLEs; you might never have gotten an Answer without them.

Every table should have a PRIMARY KEY. If you have a column (or combination of columns) that 'naturally' works, use that. Else use an AUTO_INCREMENT.
When timing queries, (1) make sure the "Query cache" is not being used, and (2) run the query twice to check for other variations in timing.
INDEX(fkey) is INVISIBLE, hence not used. Don't waste learning time on VISIBLE/INVISIBLE, you may never need them in your career.
When experimenting, be sure to have more than a few rows in each table, and have their values varying in a realistic way. Else, the Optimizer may take shortcuts that will only confuse your learning experience.

And...

Click to copy

duration : 0.015s / fetch:5.532s (5.760.434 rows)
duration : 6.625s / fetch:0.000s (1000 rows)

Notice how both are about 6 seconds. It is just that the time is split up differently.

With 6M rows and no DISTINCT, the query can pump out the data immediately, but takes a long time due to network latency.
With the DISTINCT, the first row cannot come out until after preforming the "de-duplication", which probably involves a "temporary" (see the EXPLAIN) and a sort. So, now all the time is involved in computing before sending the data.
The confusion is that you looked only at the "duration" not the sum of the two times. That is, the total time is the important one to note.
The DISTINCT one is slightly slower (total time) because of the extra step of collecting and sorting 5.7M rows.

175

answered Nov 14 '22 23:11

Rick James

Related questions
                            
                                how to use canonical functions in Entity Framework and Mysql
                            
                                Php and Mysql to rank student based on their total scores and give . Also not rank consecutively if their is ties
                            
                                MySql Pivot table in to Yii CActiveDataProvider
                            
                                Implementing likes,comment,views counter for a product
                            
                                How to design great database structure for POS system
                            
                                When multiple users are viewing a record and 1 person updates the record, how to notify other the record is updated?
                            
                                character set between PHP and MySQL
                            
                                The last packet sent successfully to the server was 79,547 milliseconds ago. is longer than the server configured value of 'wait_timeout'
                            
                                MySQLdb and _mysql versions ncompatible: how to upgrade _mysql
                            
                                Ignore Lock in MYSQL Database in Sqlalchemy Query
                            
                                How to disable sorting by primary key in Django Admin?
                            
                                Nesting categories for MySQL
                            
                                Deep linked values for a page in Laravel database models
                            
                                EntityFramework Core Database.EnsureCreated doesn't create database
                            
                                Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs
                            
                                MySql search time stamp based on timezone offset
                            
                                How to establish_connection with more than one database in parallel in Rails?
                            
                                SQL sorting by number and stay grouped
                            
                                Best practice for storing passwords that need to be reused?
                            
                                How fix permanently: Job for mysql.service failed because the control process exited with error code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MySQL distinct performance

Tags:

join

optimization

mysql

distinct

innodb

Langer

People also ask

1 Answers

Rick James

Recent Activity

Donate For Us