Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize this simple JOIN+ORDER BY query?

I have two mysql tables:

/* Table users */
CREATE TABLE IF NOT EXISTS `users` (
  `Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `DateRegistered` datetime NOT NULL,
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/* Table statistics_user */
CREATE TABLE IF NOT EXISTS `statistics_user` (
  `UserId` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `Sent_Views` int(10) unsigned NOT NULL DEFAULT '0',
  `Sent_Winks` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`UserId`),
  CONSTRAINT `statistics_user_ibfk_1` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Both tables are populated with 10.000 random rows for testing by using the following procedure:

DELIMITER //
CREATE DEFINER=`root`@`localhost` PROCEDURE `FillUsersStatistics`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE dt DATE;
DECLARE Winks INT DEFAULT 1;
DECLARE Views INT DEFAULT 1;

WHILE (i<=cnt) DO
        SET dt = str_to_date(concat(floor(1 + rand() * (9-1)),'-',floor(1 + rand() * (28 -1)),'-','2011'),'%m-%d-%Y');

        INSERT INTO users (Id, DateRegistered) VALUES(i, dt);

        SET Winks = floor(1 + rand() * (30-1));
        SET Views = floor(1 + rand() * (30-1));
        INSERT INTO statistics_user (UserId, Sent_Winks, Sent_Views) VALUES (i, Winks, Views);

     SET i=i+1;
END WHILE;

END//
DELIMITER ;
CALL `FillUsersStatistics`(10000);

The problem:

When I run the EXPLAIN for this query:

SELECT
t1.Id, (Sent_Views + Sent_Winks) / DATEDIFF(NOW(), t1.DateRegistered) as Score
FROM users t1
JOIN  statistics_user t2 ON t2.UserId = t1.Id
ORDER BY Score DESC

.. I get this explain:

Id  select_type table   type    possible_keys   key     key_len     ref             rows    extra
1   SIMPLE      t1      ALL     PRIMARY         (NULL)  (NULL)      (NULL)          10037   Using temporary; Using filesort
1   SIMPLE      t2      eq_ref  PRIMARY         PRIMARY 4           test2.t2.UserId 1   

The above query gets very slow when both tables have more than 500K rows. I guess it's because of the 'Using temporary; Using filesort' in the explain of the query.

How can the above query be optimized so that it runs faster?

like image 470
user1009456 Avatar asked Nov 14 '22 15:11

user1009456


1 Answers

I'm faily sure that the ORDER BY is what's killing you, since it cannot be properly indexed. Here is a workable, if not particularly pretty, solution.

First, let's say you have a column named Score for storing a user's current score. Every time a user's Sent_Views or Sent_Winks changes, modify the Score column to match. This could probably be done with a trigger (my experience with triggers is limited), or definitely done in the same code that updates the Sent_Views and Sent_Winks fields. This change wouldn't need to know the DATEDIFF portion, because it could just divide by the old sum of Sent_Views + Sent_Winks and multiply by the new one.

Now you just need to change the Score column once per day (if you're not picky about the precise number of hours a user has been registered). This could be done with a script run by a cron job.

Then, just index the Score column and SELECT away!

Note: edited to remove incorrect first attempt.

like image 105
Red Orca Avatar answered Nov 17 '22 04:11

Red Orca