I have a relatively large 4-deep relational data setup like this:
ClientApplication has_many => ClientApplicationVersions
ClientApplicationVersions has_many => CloudLogs
CloudLogs has_many => Logs
client_applications
: (potentially 1,000's of records)
- ...
- account_id
- public_key
- deleted_at
client_application_versions
: (potentially 10,000's of records)
- ...
- client_application_id
- public_key
- deleted_at
cloud_logs
: (potentially 1,000,000's of records)
- ...
- client_application_version_id
- public_key
- deleted_at
logs
: (potentially 1,000,000,000's of records)
- ...
- cloud_log_id
- public_key
- time_stamp
- deleted_at
I am still in development so the structure and setup is not set in stone, but I hope it is setup ok. Using Rails 3.2.11 and InnoDB MySQL. The database is currently filled with a small (compared to the eventual db size) set of data (logs
only has 500,000 rows) I have 4 scoped queries, 3 of which are problematic, to retrieve logs.
account_id
, client_application.public_key
, client_application_version.public_key
(Over 100 seconds)account_id
, client_application.public_key
(Over 100 seconds)account_id
(Over 100 seconds)I am using rails scopes to help make these calls:
scope :account_id, proc {|account_id| joins(:client_application).where("client_applications.account_id = ?", account_id) }
scope :client_application_key, proc {|client_application_key| joins(:client_application).where("client_applications.public_key = ?", client_application_key) }
scope :client_application_version_key, proc {|client_application_version_key| joins(:client_application_version).where("client_application_versions.public_key = ?", client_application_version_key) }
default_scope order('logs.timestamp DESC')
I have indices on each table on public_key
. I have several indices on the logs
table including the one that the optimizer prefers to use (index_logs_on_cloud_log_id
), but the queries are still taking eons to run.
Here is how I am calling the method in rails console
:
Log.account_id(1).client_application_key('p0kZudG0').client_application_version_key('0HgoJRyE').page(1)
... here is what rails turns it into:
SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `cloud_logs`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` ON `client_application_versions`.`id` = `cloud_logs`.`client_application_version_id` INNER JOIN `client_applications` ON `client_applications`.`id` = `client_application_versions`.`client_application_id` INNER JOIN `cloud_logs` `cloud_logs_logs_join` ON `cloud_logs_logs_join`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` `client_application_versions_logs` ON `client_application_versions_logs`.`id` = `cloud_logs_logs_join`.`client_application_version_id` WHERE (logs.deleted_at IS NULL) AND (client_applications.account_id = 1) AND (client_applications.public_key = 'p0kZudG0') AND (client_application_versions.public_key = '0HgoJRyE') ORDER BY logs.timestamp DESC LIMIT 100 OFFSET 0
... and here is the EXPLAIN statement for that query.
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | client_application_versions | ref | PRIMARY,index_client_application_versions_on_client_application_id,index_client_application_versions_on_public_key | index_client_application_versions_on_public_key | 768 | const | 1 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | client_applications | eq_ref | PRIMARY,index_client_applications_on_account_id,index_client_applications_on_public_key | PRIMARY | 4 | cloudlog_production.client_application_versions.client_application_id | 1 | Using where |
| 1 | SIMPLE | cloud_logs | ref | PRIMARY,index_cloud_logs_on_client_application_version_id | index_cloud_logs_on_client_application_version_id | 5 | cloudlog_production.client_application_versions.id | 481 | Using where; Using index |
| 1 | SIMPLE | cloud_logs_logs_join | eq_ref | PRIMARY,index_cloud_logs_on_client_application_version_id | PRIMARY | 4 | cloudlog_production.cloud_logs.id | 1 | |
| 1 | SIMPLE | client_application_versions_logs | eq_ref | PRIMARY | PRIMARY | 4 | cloudlog_production.cloud_logs_logs_join.client_application_version_id | 1 | Using index |
| 1 | SIMPLE | logs | ref | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id | 5 | cloudlog_production.cloud_logs.id | 4 | Using where |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
This question has 3 parts to it:
find
run in a more performant way?
UPDATE 1/24/12
As suggested by Geoff and J_MCCaffrey in the answers, I have split the query up into 3 different sections to try and isolate the problem. As expected, it is a problem dealing with the largest table. The MYSQL optimizer handles this differently by using different indices, but the delay persists. Here is the EXPLAIN for this approach.
ClientApplication.find_by_account_id_and_public_key(1, 'p0kZudG0').versions.select{|cav| cav.public_key = '0HgoJRyE'}.first.logs.page(2)
ClientApplication Load (165.9ms) SELECT `client_applications`.* FROM `client_applications` WHERE `client_applications`.`account_id` = 1 AND `client_applications`.`public_key` = 'p0kZudG0' AND (client_applications.deleted_at IS NULL) ORDER BY client_applications.id LIMIT 1
ClientApplicationVersion Load (105.1ms) SELECT `client_application_versions`.* FROM `client_application_versions` WHERE `client_application_versions`.`client_application_id` = 3 AND (client_application_versions.deleted_at IS NULL) ORDER BY client_application_versions.created_at DESC, client_application_versions.id DESC
Log Load (57295.0ms) SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
EXPLAIN (214.5ms) EXPLAIN SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
EXPLAIN for: SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | cloud_logs | index_merge | PRIMARY,index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at | index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at | 5,9 | NULL | 1874 | Using intersect(index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at); Using where; Using temporary; Using filesort |
| 1 | SIMPLE | logs | ref | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id | 5 | cloudlog_production.cloud_logs.id | 4 | Using where |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
UPDATE 1/25/12
Here are the indices for all relevant tables:
CLIENT_APPLICATIONS:
PRIMARY KEY (`id`),
UNIQUE KEY `index_client_applications_on_key` (`key`),
KEY `index_client_applications_on_account_id` (`account_id`),
KEY `index_client_applications_on_deleted_at` (`deleted_at`),
KEY `index_client_applications_on_public_key` (`public_key`)
CLIENT_APPLICATION_VERSIONS:
PRIMARY KEY (`id`),
KEY `index_client_application_versions_on_client_application_id` (`client_application_id`),
KEY `index_client_application_versions_on_deleted_at` (`deleted_at`),
KEY `index_client_application_versions_on_public_key` (`public_key`)
CLOUD_LOGS:
PRIMARY KEY (`id`),
KEY `index_cloud_logs_on_api_client_version_id` (`api_client_version_id`),
KEY `index_cloud_logs_on_client_application_version_id` (`client_application_version_id`),
KEY `index_cloud_logs_on_deleted_at` (`deleted_at`),
KEY `index_cloud_logs_on_device_id` (`device_id`),
KEY `index_cloud_logs_on_public_key` (`public_key`),
KEY `index_cloud_logs_on_received_at` (`received_at`)
LOGS:
PRIMARY KEY (`id`),
KEY `index_logs_on_class_name` (`class_name`),
KEY `index_logs_on_cloud_log_id_and_deleted_at_and_timestamp` (`cloud_log_id`,`deleted_at`,`timestamp`),
KEY `index_logs_on_cloud_log_id_and_deleted_at` (`cloud_log_id`,`deleted_at`),
KEY `index_logs_on_cloud_log_id` (`cloud_log_id`),
KEY `index_logs_on_deleted_at` (`deleted_at`),
KEY `index_logs_on_file_name` (`file_name`),
KEY `index_logs_on_method_name` (`method_name`),
KEY `index_logs_on_public_key` (`public_key`),
KEY `index_logs_on_timestamp` USING BTREE (`timestamp`)
Displaying it with a better structure, the query looks like this (already re-arranged)
SELECT
`logs`.*
FROM
`logs` as l
INNER JOIN `cloud_logs` as cl1
ON
cl1.id = l.cloud_log_id
INNER JOIN `cloud_logs` as cl2
ON
cl2.id = l.cloud_log_id
INNER JOIN `client_application_versions` as cav1
ON
cav1.id = cl1.client_application_version_id
INNER JOIN `client_application_versions` as cav2
ON
cav2.id = cl2.client_application_version_id
INNER JOIN `client_applications` as ca
ON
ca.id = cav1.client_application_id
WHERE
(l.deleted_at IS NULL)
AND
(ca.account_id = 1)
AND
(ca.public_key = 'p0kZudG0')
AND
(cav.public_key = '0HgoJRyE')
ORDER BY
logs.timestamp DESC
LIMIT
0, 100
When looking at cav1/cl1 and cav2/cl2 one can see, that cav2 and cl2 are never used. There is no filter applied to them apart from the ON statement.
Therefore cav1 is linked to the correct accoutn, cav2 is not linked to any account, and rather contains all matching accounts. This is no problem for the result of the query, but for the size of the join buffer.
Removing the joins (and ON-parts of them) yields:
SELECT
`logs`.*
FROM
`logs` as l
INNER JOIN `cloud_logs` as cl1
ON
cl1.id = l.cloud_log_id
-- INNER JOIN `cloud_logs` as cl2
-- ON
-- cl2.id = l.cloud_log_id
INNER JOIN `client_application_versions` as cav1 use index for join (`index_cavs_on_client_application_id_and_public_key`)
ON
cav1.id = cl1.client_application_version_id
AND
cav1.public_key = '0HgoJRyE'
-- INNER JOIN `client_application_versions` as cav2
-- ON
-- cav2.id = cl2.client_application_version_id
INNER JOIN `client_applications` as ca
ON
ca.id = cav1.client_application_id
WHERE
(l.deleted_at IS NULL)
AND
(ca.account_id = 1)
AND
(ca.public_key = 'p0kZudG0')
ORDER BY
logs.timestamp DESC
LIMIT
0, 100
This one should be faster.
Packing this into something useable in console (assuming correct table relations and meta_where):
Log.where(:deleted_at.ne => nil).order("logs.timestamp desc").joins(:cloud_logs) & \
CloudLog.joins(:client_application_versions) & \
ClientApplicationVersion.where(:public_key => '0HgoJRyE').joins(:client_applications) & \
ClientApplication.where(:public_key => 'p0kZudG0', :account_id => 1)
Since i can't reproduce this here, you might have to try for yourself (or add a to_sql to the end) And also add an explain to the shortened query above.
The results might be interesting.
update: after seeing the result and definitions (comments below):
try adding a key:
alter table client_application_versions add key (`client_application_id`, `public_key`);
this should prevent the filesort and speed things up.
edit: updated query to hint mysql about the key.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With