Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

optimising a mysql query with 2 joins and group by clauses

I have a query that takes 10-20 seconds, but I am sure it can be optimized, I am just not good enough to do it. I would like some help and explaining so that I can apply it to similar queries. Here is my query :

SELECT
        `store_formats`.`Store Nbr`,
        `store_formats`.`Store Name`,
        `store_formats`.`Format Name`,
        `eds_sales`.`Date`,
         sum(`eds_sales`.`EPOS Sales`) AS Sales,
         sum(`eds_sales`.`EPOS Quantity`) AS Quantity
         FROM
         `eds_sales`
         INNER JOIN `item_codes` ON `eds_sales`.`Prime Item Nbr` = `item_codes`.`Customer Item`
         INNER JOIN `store_formats` ON `eds_sales`.`Store Nbr` = `store_formats`.`Store Nbr`
         WHERE
         `eds_sales`.`Store Nbr` IN ($storenbr) AND
         `eds_sales`.`Date`  BETWEEN '$startdate' AND '$enddate' AND
         `eds_sales`.`Client` = '$customer' AND
         `eds_sales`.`Retailer` IN ($retailer) AND
         `store_formats`.`Format Name` IN ($storeformat) AND
         `item_codes`.`Item Number` IN ($products)
         GROUP BY
         `store_formats`.`Store Name`,
         `store_formats`.`Store Nbr`,
         `store_formats`.`Format Name`,
         `eds_sales`.`Date`

Here is the explain output : enter image description here

As you will see there, I have tried and created a few indexes with the columns involved with not much success. The main delay is caused from the copying to temporary table I think.

Those are the tables involved :

store_formats :

CREATE TABLE `store_formats` (
`id` int(12) NOT NULL,
`Store Nbr` smallint(5) UNSIGNED DEFAULT NULL,
`Store Name` varchar(27) DEFAULT NULL,
`City` varchar(19) DEFAULT NULL,
`Post Code` varchar(9) DEFAULT NULL,
`Region #` int(2) DEFAULT NULL,
`Region Name` varchar(10) DEFAULT NULL,
`Distr #` int(3) DEFAULT NULL,
`Dist Name` varchar(26) DEFAULT NULL,
`Square Footage` varchar(7) DEFAULT NULL,
`Format` int(1) DEFAULT NULL,
`Format Name` varchar(23) DEFAULT NULL,
`Store Type` varchar(20) DEFAULT NULL,
`TV Region` varchar(12) DEFAULT NULL,
`Pharmacy` varchar(3) DEFAULT NULL,
`Optician` varchar(3) DEFAULT NULL,
`Home Shopping` varchar(3) DEFAULT NULL,
`Retailer` varchar(15) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `store_formats`
ADD PRIMARY KEY (`id`),
ADD UNIQUE KEY `uniqness` (`Store Nbr`,`Store Name`,`Format`),
ADD KEY `Store Nbr_2` (`Store Nbr`,`Format Name`,`Store Name`);

eds_sales :

CREATE TABLE `eds_sales` (
`id` int(12) UNSIGNED NOT NULL,
`Prime Item Nbr` mediumint(7) NOT NULL,
`Prime Item Desc` varchar(255) NOT NULL,
`Prime Size Desc` varchar(255) NOT NULL,
`Variety` varchar(255) NOT NULL,
`WHPK Qty` int(5) NOT NULL,
`SUPPK Qty` int(5) NOT NULL,
`Depot Nbr` int(5) NOT NULL,
`Depot Name` varchar(50) NOT NULL,
`Store Nbr` smallint(5) UNSIGNED NOT NULL,
`Store Name` varchar(255) NOT NULL,
`EPOS Quantity` smallint(3) NOT NULL,
`EPOS Sales` decimal(13,2) NOT NULL,
`Date` date NOT NULL,
`Client` varchar(10) NOT NULL,
`Retailer` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `eds_sales`
ADD UNIQUE KEY `uniqness` (`Prime Item Nbr`,`Prime Item Desc`,`Prime Size Desc`,`Variety`,`WHPK Qty`,`SUPPK Qty`,`Depot Nbr`,`Depot Name`,`Store Nbr`,`Store Name`,`Date`,`Client`) USING BTREE,
ADD KEY `Store Nbr` (`Store Nbr`),
ADD KEY `Prime Item Nbr_2` (`Prime Item Nbr`,`Date`),
ADD KEY `id` (`id`) USING BTREE,
ADD KEY `Store Nbr_2` (`Prime Item Nbr`,`Store Nbr`,`Date`,`Client`,`Retailer`) USING BTREE,
ADD KEY `Client` (`Client`,`Store Nbr`,`Date`),
ADD KEY `Date` (`Date`,`Client`,`Retailer`);

item_codes :

CREATE TABLE `item_codes` (
`id` int(12) NOT NULL,
`Item Number` varchar(30) CHARACTER SET latin1 NOT NULL,
`Customer Item` mediumint(7) NOT NULL,
`Description` varchar(255) CHARACTER SET latin1 NOT NULL,
`Status` varchar(15) CHARACTER SET latin1 NOT NULL,
`Customer` varchar(30) CHARACTER SET latin1 NOT NULL,
`Sort Name` varchar(255) CHARACTER SET latin1 NOT NULL,
`EquidataCustomer` varchar(30) CHARACTER SET latin1 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `item_codes`
ADD PRIMARY KEY (`id`),
ADD UNIQUE KEY `uniq` (`Item Number`,`Customer Item`,`Customer`,`EquidataCustomer`),
ADD KEY `Item Number_2` (`Item Number`,`Sort Name`,`EquidataCustomer`),
ADD KEY `Customer Item` (`Customer Item`,`Item Number`,`Sort Name`,`EquidataCustomer`),
ADD KEY `Customer Item_2` (`Customer Item`,`Item Number`,`EquidataCustomer`);

So my questions : As you can see I am joining 3 tables, and I am looking for sales by date by store format. I have been trying different kind of joins, or for example instead of joining sales to item_codes and store_formats, joining store_formats to the others, but with same results. I am also passing some arrays of variables using the IN as those are fed by select-boxes in the application.

  1. Best way to join those tables
  2. Suggest the best indexes per table
  3. why do I get temporary tables? is it because of the group by? is there a workaround?
  4. If there is a need for temp tables is there anyway to speed this creation up? (I already have the data folder in a raid with 8 disks but still slow.
  5. Of course any suggested alternatives are welcome

UPDATE : Updated my tables with some suggestions from the comments

UPDATE : Modified my my.cnf as bellow increased performance (My RAM is 8GB, 2 cores, /data/tmp is on a 8 drive raid , same as where the data is)

tmpdir          = /dev/shm/:/data/tmp:/tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
expire_logs_days        = 10
max_binlog_size   = 100M
innodb_buffer_pool_size = 6G
innodb_buffer_pool_instances = 6
query_cache_type=1
like image 909
Ioannis Kokkinis Avatar asked Oct 18 '22 02:10

Ioannis Kokkinis


2 Answers

(Far too much to put in a Comment; please excuse me for using an Answer.)

When you have INDEX(a) and INDEX(a,b), the former is redundant and should be removed. I see about 5 cases of such.

Each store_nbr has exactly one store_name? If so, it is redundant to have store_name in more than one table. I don't know the intent of store_formats, but I guess that is the one table to house the store_name. Note that there is an inconsistent size for the datatypes of the two store_name columns and for the store_nbr columns!

Seems like every store should have a unique number, if so, then ADD UNIQUE KEY uniqness (Store Nbr,Store Name) should probably be turned into PRIMARY KEY(store_nbr). (Sorry, I am not going to put spaces in you column names.)

It is rarely useful to start an index with a date, so get rid of KEY Date_2 (Date,Client). In its place, add INDEX(Client, store_nbr, Date); that should have a direct impact on the speed of the query. You will probably see the EXPLAIN SELECT... change.

int(4) -- perhaps you meant SMALLINT UNSIGNED?

Having Date in a UNIQUE (or PRIMARY) key is usually "wrong". What it a 'Client' made two purchases of the same thing in the same day?

After you have made those changes, let's talk some more.

For consistency of viewing, please provide SHOW CREATE TABLE.

Avoid this construct:

FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...

It is inefficient because neither subquery has an index for making the JOIN efficient.

like image 58
Rick James Avatar answered Oct 27 '22 19:10

Rick James


Moved selects into sub-queries to minimize items for the join. I believe MySQL would have done that already for you. I would check the execution plan for that info.

SELECT
  stores.nbr, stores.name, stores.format,
  epos.date,
  sum(epos.sales) AS Sales,
  sum(epos.qty) AS Quantity
FROM
  (SELECT `Date` as `date`, `EPOS Sales` as sales,`EPOS Quantity` as qty, `Prime Item Nbr` as item_number, `Store Nbr` as store_number
FROM
  `eds_sales`
WHERE
  `eds_sales`.`Store Nbr` IN ($storenbr) AND
  `eds_sales`.`Date`  BETWEEN '$startdate' AND '$enddate' AND
  `eds_sales`.`Client` = '$customer' AND
  `eds_sales`.`Retailer` IN ($retailer)) as epos

  INNER JOIN 

   (SELECT `Customer Item` as custItem
   FROM `item_codes`
   WHERE
     `item_codes`.`Item Number` IN ($products)) as items ON epos.item_number = items.custItem

  INNER JOIN 

    (SELECT `Store Nbr` as nbr, `Store Name` as name, `Format Name` as format
   FROM
     `store_formats`
   WHERE
     `store_formats`.`Format Name` IN ($storeformat)) as stores ON epos.store_number = stores.nbr
GROUP BY
  stores.name,
  stores.nbr,
  stores.format,
  epos.date
like image 30
justindevs Avatar answered Oct 27 '22 19:10

justindevs