Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql: Optimize count query on derived table

I am trying to do a count query on a derived table for paging purposes. The query look like this:

SELECT 
    assignment_completions.id as id,
    assignment_completions.first_name,
    assignment_completions.last_name,
    groups.name

FROM
    assignment_completions
        LEFT JOIN
    groups_users ON assignment_completions.user_id = groups_users.user_id
        LEFT JOIN
    groups ON groups_users.group_id = groups.id
WHERE
    assignment_completions.handler = 'course'
GROUP BY assignment_completions.id

The count query just wraps the above query like this:

SELECT COUNT(*) FROM (...) AS assignment_count

The query without the count executes in .005 seconds. The query with the count executes in 1.5 seconds.

I have tried the following with no luck:

1) Use indexed column (there is no performance gain here):

SELECT COUNT (id) FROM (...)

2) I have tried using SQL_CALC_FOUND_ROWS but it was actually a little slower (2 seconds or so).

Details:

assignment_completions: 200k rows

users: 35k rows

groups_users: 500k rows

groups: 1k rows

Table Definitions

CREATE TABLE `assignment_completions` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) DEFAULT NULL,
  `assignment_id` int(11) DEFAULT NULL,
  `handler` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `handler_id` int(11) DEFAULT NULL,
  `time_started` datetime DEFAULT NULL,
  `time_end` datetime DEFAULT NULL,
  `status` int(11) DEFAULT NULL,
  `application_instance_id` int(11) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `first_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `last_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_assignment_completions_on_first_name` (`first_name`) USING BTREE,
  KEY `index_assignment_completions_on_last_name` (`last_name`) USING BTREE,
  KEY `index_assignment_completions_on_user_id` (`user_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=200001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `email` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  `encrypted_password` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  `reset_password_token` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `reset_password_sent_at` datetime DEFAULT NULL,
  `remember_created_at` datetime DEFAULT NULL,
  `sign_in_count` int(11) NOT NULL DEFAULT '0',
  `current_sign_in_at` datetime DEFAULT NULL,
  `last_sign_in_at` datetime DEFAULT NULL,
  `current_sign_in_ip` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `last_sign_in_ip` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `application_instance_id` int(11) DEFAULT NULL,
  `username` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `first_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `last_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `status` int(11) DEFAULT NULL,
  `group_list_cache` text COLLATE utf8_unicode_ci,
  PRIMARY KEY (`id`),
  UNIQUE KEY `index_users_on_reset_password_token` (`reset_password_token`) USING BTREE,
  UNIQUE KEY `index_users_on_username_and_application_instance_id` (`username`,`application_instance_id`) USING BTREE,
  KEY `index_users_on_application_instance_id` (`application_instance_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=30006 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `groups_users` (
  `group_id` int(11) DEFAULT NULL,
  `user_id` int(11) DEFAULT NULL,
  UNIQUE KEY `index_groups_users_on_group_id_and_user_id` (`group_id`,`user_id`) USING BTREE,
  KEY `index_groups_users_on_group_id` (`group_id`) USING BTREE,
  KEY `index_groups_users_on_user_id` (`user_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `groups` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `application_instance_id` int(11) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `description` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `group_type` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1045 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

EXPLAIN for query:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: assignment_completions
         type: index
possible_keys: PRIMARY,index_assignment_completions_on_first_name,index_assignment_completions_on_last_name,index_assignment_completions_on_user_id
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 199088
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: users
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: lms.assignment_completions.user_id
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: groups_users
         type: ref
possible_keys: index_groups_users_on_user_id
          key: index_groups_users_on_user_id
      key_len: 5
          ref: lms.users.id
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: groups
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: lms.groups_users.group_id
         rows: 1
     filtered: 100.00
        Extra: Using index

EXPLAIN for count query:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 199088
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: assignment_completions
         type: index
possible_keys: PRIMARY,index_assignment_completions_on_first_name,index_assignment_completions_on_last_name,index_assignment_completions_on_user_id
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 199088
     filtered: 100.00
        Extra: Using where
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: users
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: lms.assignment_completions.user_id
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 4. row ***************************
           id: 2
  select_type: DERIVED
        table: groups_users
         type: ref
possible_keys: index_groups_users_on_user_id
          key: index_groups_users_on_user_id
      key_len: 5
          ref: lms.users.id
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 5. row ***************************
           id: 2
  select_type: DERIVED
        table: groups
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: lms.groups_users.group_id
         rows: 1
     filtered: 100.00
        Extra: Using index

I need to count the total results for paging purposes.

EDIT

Occasionally this query is modified, which is why the groups are joined. Sometimes a where clause is added for groups:

AND groups.name LIKE "%abc%"

For this reason the join with the groups table is necessary.

like image 931
orourkedd Avatar asked Mar 19 '14 18:03

orourkedd


People also ask

How do I optimize a derived table in MySQL?

2.4 Optimizing Derived Tables and View References with Merging or Materialization. The optimizer can handle derived table references using two strategies (which also apply to view references): Merge the derived table into the outer query block. Materialize the derived table to an internal temporary table.

Which of the following is are hints in MySQL?

hint_name : These hint names are permitted: BKA , NO_BKA : Enable or disable batched key access for the specified tables. BNL , NO_BNL : Enable or disable block nested loop for the specified tables. In MySQL 8.0.


2 Answers

To process a SELECT COUNT(*) FROM t statement, InnoDB scans an index of the table, which takes some time if the index is not entirely in the buffer pool. If your table does not change often, using the MySQL query cache is a good solution. To get a fast count.

You can also force InnoDB to use an index:

  SELECT COUNT(id) FROM assignment_completions USE INDEX (PRIMARY);

Other then that , i see you using many indexes,which will slow your query.

try use only the index on id which you will count on.

Innodb does not have cached row count. so count(*) without where clause is slow with Innodb tables.

this may help you

like image 156
echo_Me Avatar answered Oct 19 '22 22:10

echo_Me


I would have the following covering indexes on the data so the engine does not have to go back to the raw data pages. Also, since the only purpose you are going from users to groups_users, but the groups users is based on the user id anyhow, so from the assignment completions directly to the groups_users table

table                   index
assignment_completions  ( handler, id, user_id, first_name, last_name )
groups_users            ( user_id, group_id )
groups                  ( id, name )

SELECT STRAIGHT_JOIN
      assignment_completions.id as id,
      assignment_completions.first_name,
      assignment_completions.last_name,
      groups.name
   FROM
      assignment_completions
         LEFT JOIN groups_users 
            ON assignment_completions.user_id = groups_users.user_id
            LEFT JOIN groups 
               ON groups_users.group_id = groups.id
   WHERE
      assignment_completions.handler = 'course'
   GROUP BY 
      assignment_completions.id

@orourkedd, I looked again at the table structure. Really, would you ever expect a name (first/last) to EVER be more than 20-25 characters? Also, the handler field to 255 chars? Really? If that is allowing the data pages to be over-bloated, could that be part of a delay issue? Similarly in the user's table. Also, if the assignment_completion's first / last name is that of the user who completed the assignment, storing just the user's ID to the table instead would be MUCH easier long term and shorten the disk requirements on assignment completions. Yes, an extra join, but since it would be based on an "ID" column it would be fast, especially if the user's table was a covering index on (id, first_name, last_name), but definitely would shrink down the size of those columns. Similarly to the groups "name" column. Because I don't know how the actual indexes are prepared, if it IS building out allocation of 255 per field, that could hinder the indexed performance.

After re-reviewing my own comments. It appears you have a handler lookup table to (by reference to the handler_id column). I would change the index from (handler, ...) to (handler_id, ...) on the assignment_completions table and REMOVE the handler column as the duplication over 200k rows vs a single lookup table ID pointer will cut amount of pages to blow through. Likewise for saving a "userID" vs the full first/last name in the assignment completions table. This is forcing 3 columns of up to 255 chars each for 200k records (as of now and obviously going to grow).

like image 25
DRapp Avatar answered Oct 19 '22 23:10

DRapp