Why is my MySQL group by so slow?

Tags:

I am trying to query against a partitioned table (by month) approaching 20M rows. I need to group by DATE(transaction_utc) as well as country_id. The rows that get returned if i turn off the group by and aggregates is just over 40k, which isn't too many, however adding the group by makes the query substantially slower unless said GROUP BY is on the transaction_utc column, in which case it gets FAST.

I've been trying to optimize this first query below by tweaking the query and/or the indexes, and got to the point below (about 2x as fast as initially) however still stuck with a 5s query for summarizing 45k rows, which seems way too much.

For reference, this box is a brand new 24 logical core, 64GB RAM, Mariadb-5.5.x server with way more INNODB buffer pool available than index space on the server, so shouldn't be any RAM or CPU pressures.

So, I'm looking for ideas on what is causing this slow down and suggestions on speeding it up. Any feedback would be greatly appreciated! :)

Ok, onto the details...

The following query (the one I actually need) takes approx 5 seconds (+/-), and returns less than 100 rows.

SELECT lss.`country_id` AS CountryId
, Date(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName,  lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD  
FROM `sales` lss  
JOIN `countries` c ON lss.`country_id` = c.`country_id`  
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )  GROUP BY lss.`country_id`, DATE(lss.`transaction_utc`)

EXPLAIN SELECT for the same query is as follows. Notice that it's not using the transaction_utc key. Shouldn't it be using my covering index instead?

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  lss ref idx_unique,transaction_utc,country_id   idx_unique  50  const   1208802 Using where; Using temporary; Using filesort
1   SIMPLE  c   eq_ref  PRIMARY PRIMARY 4   georiot.lss.country_id  1

Now onto a couple other options that I've tried to attempt to determine whats going on...

The following query (changed group by) takes about 5 seconds (+/-), and returns only 3 rows:

SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName,  lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD  
FROM `sales` lss  
JOIN `countries` c ON lss.`country_id` = c.`country_id`  
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )  GROUP BY lss.`country_id`

The following query (removed group by) takes 4-5 seconds (+/-) and returns 1 row:

SELECT lss.`country_id` AS CountryId
    , DATE(lss.`transaction_utc`) AS TransactionDate
    , c.`name` AS CountryName,  lss.`country_id` AS CountryId
    , COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
    , COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD  
    FROM `sales` lss  
    JOIN `countries` c ON lss.`country_id` = c.`country_id`  
    WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )

The following query takes .00X seconds (+/-) and returns ~45k rows. This to me shows that at max we're only trying to group 45K rows into less than 100 groups (as in my initial query):

SELECT lss.`country_id` AS CountryId
    , DATE(lss.`transaction_utc`) AS TransactionDate
    , c.`name` AS CountryName,  lss.`country_id` AS CountryId
    , COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
    , COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD  
    FROM `sales` lss  
    JOIN `countries` c ON lss.`country_id` = c.`country_id`  
    WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )
GROUP BY lss.`transaction_utc`

TABLE SCHEMA:

CREATE TABLE IF NOT EXISTS `sales` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `user_linkshare_account_id` int(11) unsigned NOT NULL,
  `username` varchar(16) NOT NULL,
  `country_id` int(4) unsigned NOT NULL,
  `order` varchar(16) NOT NULL,
  `raw_tracking_code` varchar(255) DEFAULT NULL,
  `transaction_utc` datetime NOT NULL,
  `processed_utc` datetime NOT NULL ,
  `sku` varchar(16) NOT NULL,
  `sale_original` decimal(10,4) NOT NULL,
  `sale_usd` decimal(10,4) NOT NULL,
  `quantity` int(11) NOT NULL,
  `commission_original` decimal(10,4) NOT NULL,
  `commission_usd` decimal(10,4) NOT NULL,
  `original_currency` char(3) NOT NULL,
  PRIMARY KEY (`id`,`transaction_utc`),
  UNIQUE KEY `idx_unique` (`username`,`order`,`processed_utc`,`sku`,`transaction_utc`),
  KEY `raw_tracking_code` (`raw_tracking_code`),
  KEY `idx_usd_amounts` (`sale_usd`,`commission_usd`),
  KEY `idx_countries` (`country_id`),
  KEY `transaction_utc` (`transaction_utc`,`username`,`country_id`,`sale_usd`,`commission_usd`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE ( TO_DAYS(`transaction_utc`))
(PARTITION pOLD VALUES LESS THAN (735112) ENGINE = InnoDB,
 PARTITION p201209 VALUES LESS THAN (735142) ENGINE = InnoDB,
 PARTITION p201210 VALUES LESS THAN (735173) ENGINE = InnoDB,
 PARTITION p201211 VALUES LESS THAN (735203) ENGINE = InnoDB,
 PARTITION p201212 VALUES LESS THAN (735234) ENGINE = InnoDB,
 PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ AUTO_INCREMENT=19696320 ;

694

asked Oct 27 '12 20:10

JesseP

1 Answers

The offending part is probably the GROUP BY DATE(transaction_utc). You also claim to have a covering index for this query but I see none. Your 5-column index has all the columns used in the query but not in the best order (which is: WHERE - GROUP BY - SELECT).

So, the engine, finding no useful index, would have to evaluate this function for all the 20M rows. Actually, it finds an index that starts with username (the idx_unique) and it uses that, so it has to evaluate the function for (only) 1.2M rows. If you had a (transaction_utc) or a (username, transaction_utc) it would choose the most useful of the three.

Can you afford to change the table structure by splitting the column into date and time parts? If you can, then an index on (username, country_id, transaction_date) or (changing the order of the two columns used for grouping), on (username, transaction_date, country_id) would be quite efficient.

A covering index on (username, country_id, transaction_date, sale_usd, commission_usd) even better.

If you want to keep the current structure, try changing the order inside your 5-column index to:

(username, country_id, transaction_utc, sale_usd, commission_usd)

or to:

(username, transaction_utc, country_id, sale_usd, commission_usd)

Since you are using MariaDB, you can use the VIRTUAL columns feature, without changing the existing columns:

Add a virtual (persistent) column and the appropriate index:

ALTER TABLE sales 
    ADD COLUMN transaction_date DATE NOT NULL
               AS DATE(transaction_utc) 
               PERSISTENT 
    ADD INDEX special_IDX 
        (username, country_id, transaction_date, sale_usd, commission_usd) ;

181

answered Oct 13 '22 00:10

ypercubeᵀᴹ

Related questions
                            
                                Preventing MySQL from inserting implicit default values into not null columns
                            
                                Python MySQLdb slow in updating values
                            
                                MySQL: Group by two columns and sum
                            
                                Concurrency in Doctrine
                            
                                Select the Latest Message Between the communication of two user in mysql
                            
                                Why same query giving two different result?
                            
                                Ignore particular WHERE criteria
                            
                                How to hide password to MySQL database from people using the program
                            
                                Validate Mobile number in php form
                            
                                MySQL: How to lock tables and start a transaction?
                            
                                Export database schema as ASCII from mySQL so I can post it on SO
                            
                                How to sum records of a single column with different possibilities?
                            
                                How to set up a JDBCRealm in Apache Tomcat 7?
                            
                                how to use LINQ to SQL with mySQL
                            
                                Check if query was successful in sqlite for Android
                            
                                MySQL dynamic-pivot
                            
                                Select similar IP addresses - ignore last 3 digits
                            
                                MySQL: How many UPDATES per second can an average box support?
                            
                                Update column with random value
                            
                                mysql distinct row or distinct field

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is my MySQL group by so slow?

Tags:

performance

sql

optimization

mysql

mariadb

JesseP

People also ask

1 Answers

ypercubeᵀᴹ

Recent Activity

Donate For Us