Optimising MySQL queries across hierarchical data

Tags:

I have a fairly stable directed graph of order ~100k vertices and size ~1k edges. It is two-dimensional insofar as its vertices can be identified by a pair of integers (x, y) (of cardinality ~100 x ~1000) and all edges are strictly increasing in x.

There is furthermore a dictionary of ~1k (key, val) pairs associated with each vertex.

I am currently storing the graph in a MySQL database across three (InnoDB) tables: a table of vertices (which I don't think is relevant to my question, so I have omitted to include both it and the foreign key constraints that refer to it in my extracts below); a table which holds the dictionaries; and a 'closure table' of connected vertices as described so eloquently by Bill Karwin.

The table of vertex dictionaries is defined as follows:

Click to copy

CREATE TABLE `VertexDictionary` (
  `x`   smallint(6) unsigned NOT NULL,
  `y`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  `val` smallint(1) DEFAULT NULL,
  PRIMARY KEY (`x`, `y`  , `key`),
  KEY  `dict` (`x`, `key`, `val`)
);

and the closure table of connected vertices as:

Click to copy

CREATE TABLE `ConnectedVertices` (
  `tail_x` smallint(6) unsigned NOT NULL,
  `tail_y` smallint(6) unsigned NOT NULL,
  `head_x` smallint(6) unsigned NOT NULL,
  `head_y` smallint(6) unsigned NOT NULL,
  PRIMARY KEY   (`tail_x`, `tail_y`, `head_x`),
  KEY `reverse` (`head_x`, `head_y`, `tail_x`),
  KEY `fx` (`tail_x`, `head_x`),
  KEY `rx` (`head_x`, `tail_x`)
);

There is also a dictionary of (x, key) pairs such that for each such pair, all vertices identified with that x have within their dictionaries a value for that key. This dictionary is stored in a fourth table:

Click to copy

CREATE TABLE `SpecialKeys` (
  `x`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  PRIMARY KEY (`x`),
  KEY `xkey`  (`x`, `key`)
);

I often wish to extract the set of keys used in the dictionaries of all vertices having a particular x=X, together with the associated value of any SpecialKeys connected to the left:

Click to copy

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
WHERE
  `v`.`x` = X
;

for which the EXPLAIN output is:

Click to copy

id   select_type   table   type     possible_keys           key       key_len   ref                                rows   Extra
 1   SIMPLE        k       index    PRIMARY,xkey            xkey          154   NULL                                 40   Using index; Using temporary
 1   SIMPLE        c       ref      PRIMARY,reverse,fx,rx   PRIMARY         2   db.k.x                                1   Using where
 1   SIMPLE        v       ref      PRIMARY,dict            PRIMARY         4   const,db.c.head_y                   136   Using index
 1   SIMPLE        u       eq_ref   PRIMARY,dict            PRIMARY       156   db.c.tail_x,db.c.tail_y,db.k.key      1   Using where

But this query takes ~10s to complete. Been banging my head against a brick wall trying to improve matters, but to no avail.

Can the query be improved, or should I consider a different data structure? Extremely grateful for your thoughts!

UPDATE

I'm still getting nowhere with this, although I did rebuild the tables and found the EXPLAIN output to be slightly different (as now shown above, the number of rows fetched from v had increased from 1 to 136!); the query is still taking ~10s to execute.

I really don't understand what's going on here. Queries to obtain all (x, y, SpecialValue) and all (x, y, key) tuples are both very fast (~30ms and ~150ms respectively), yet essentially joining the two takes over fifty times longer than their combined time... how can I improve the time taken to perform that join?

Output of SHOW VARIABLES LIKE '%innodb%'; below:

Click to copy

Variable_name                    Value
------------------------------------------------------------
have_innodb                      YES
ignore_builtin_innodb            ON
innodb_adaptive_flushing         ON
innodb_adaptive_hash_index       ON
innodb_additional_mem_pool_size  2097152
innodb_autoextend_increment      8
innodb_autoinc_lock_mode         1
innodb_buffer_pool_size          1179648000
innodb_change_buffering          inserts
innodb_checksums                 ON
innodb_commit_concurrency        0
innodb_concurrency_tickets       500
innodb_data_file_path            ibdata1:10M:autoextend
innodb_data_home_dir             /rdsdbdata/db/innodb
innodb_doublewrite               ON
innodb_fast_shutdown             1
innodb_file_format               Antelope
innodb_file_format_check         Barracuda
innodb_file_per_table            ON
innodb_flush_log_at_trx_commit   1
innodb_flush_method              O_DIRECT
innodb_force_recovery            0
innodb_io_capacity               200
innodb_lock_wait_timeout         50
innodb_locks_unsafe_for_binlog   OFF
innodb_log_buffer_size           8388608
innodb_log_file_size             134217728
innodb_log_files_in_group        2
innodb_log_group_home_dir        /rdsdbdata/log/innodb
innodb_max_dirty_pages_pct       75
innodb_max_purge_lag             0
innodb_mirrored_log_groups       1
innodb_old_blocks_pct            37
innodb_old_blocks_time           0
innodb_open_files                300
innodb_read_ahead_threshold      56
innodb_read_io_threads           4
innodb_replication_delay         0
innodb_rollback_on_timeout       OFF
innodb_spin_wait_delay           6
innodb_stats_method              nulls_equal
innodb_stats_on_metadata         ON
innodb_stats_sample_pages        8
innodb_strict_mode               OFF
innodb_support_xa                ON
innodb_sync_spin_loops           30
innodb_table_locks               ON
innodb_thread_concurrency        0
innodb_thread_sleep_delay        10000
innodb_use_sys_malloc            ON
innodb_version                   1.0.16
innodb_write_io_threads          4

546

asked Apr 18 '12 14:04

eggyal

1 Answers

Without spending time testing it, you provided an incomplete example? you should definitely try reordering of joined tables. Explain output provides some info, let's say ordering by key_len should be heuristically fastest. First table to be filtered on should be listed as last in case the optimizer is not able to figure that out, I believe.

So, let's say 'c, v, k, u' order is the best.

Click to copy

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `VertexDictionary`  AS `v`
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
           AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  `v`.`x` = X
;

'rows' would suggest 'c/u, k, v' order, but that depends on data:

Click to copy

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `VertexDictionary`  AS `v`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
                                 AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
 WHERE
  `v`.`x` = X
;

Hope this helps.

UPDATE (avoiding the varchar join):

Click to copy

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  (`u`.`x`, `u`.`key`) IN (SELECT `k`.`x`, `k`.`key` FROM `SpecialKeys` AS `k`)
AND
  `v`.`x` = X
;

answered Sep 24 '22 09:09

MarianP

Related questions
                            
                                mysql create user only when the user doesn't exist
                            
                                How to create this table? #1071 - Specified key was too long; max key length is 1000 bytes
                            
                                Using if(isset($_POST['submit'])) to not display echo when script is open is not working
                            
                                Symfony3 - SQLSTATE[HY000] [2002] No such file or directory
                            
                                phpMyAdmin won't let me login - no error shown
                            
                                reset root password with wrong mysql config
                            
                                CLSQL and MYSQL on OS X
                            
                                Is this a safe way to increment and get the value of a counter in Django?
                            
                                Customized Customer Grid causing MySQL table scans and filesorts aka slow performance
                            
                                MySqlClient blacklisting server in ServerPool
                            
                                How to locate Azure MySQL database connection string?
                            
                                Fulltext index combined with normal index
                            
                                How to bulk insert in mySql and node.js using mysljs
                            
                                Join two subquery in MySQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimising MySQL queries across hierarchical data

Tags:

sql

data-structures

mysql

database-design

query-optimization

eggyal

People also ask

1 Answers

MarianP

Recent Activity

Donate For Us