Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Composite Primary and Cardinality

I have some questions on Composite Primary Keys and the cardinality of the columns. I searched the web, but did not find any definitive answer, so I am trying again. The questions are:

Context: Large (50M - 500M rows) OLAP Prep tables, not NOSQL, not Columnar. MySQL and DB2

1) Does the order of keys in a PK matter?

2) If the cardinality of the columns varies heavily, which should be used first. For example, if I have CLIENT/CAMPAIGN/PROGRAM where CLIENT is highly cardinal, CAMPAIGN is moderate, PROGRAM is almost like a bitmap index, what order is the best?

3) What order is the best for Join, if there is a Where clause and when there is no Where Clause (for views)

Thanks in advance.

like image 872
srini.venigalla Avatar asked May 20 '10 15:05

srini.venigalla


1 Answers

You've got "MySQL and DB2". This answer is for DB2, MySQL has none of this.

Yes, of course that is logical, but the optimiser takes much more than just that into account.

Generally, the order of the columns in the WHERE clause (join) do not (and should not) matter.

However, there are two items related to the order of predicates which may be the reason for your question.

  1. What does matter, is the order of the columns in the index, against which the WHERE clause is processed. Yes, there it is best to specify the columns in the order of highest cardinality to lowest. That allows the optimiser to target a smaller range of rows.

    • And along those lines do not bother implementing indices for single-column, low cardinality columns (there are useless). If the index is correct, then it will be used more often.
      .
  2. The order of tables being joined (not columns in the join) matters very much, it is probably the most important consideration. In fact Join Transitive Closure is automatic, and the optimiser evaluates all possible join orders, and chooses what it thinks is the best, based on Statistics (which is why UPDATE STATS is so important).

    Regardless of the no of rows in the tables, if you are joining 100 rows from table_A on a bad index with 1,000,000 rows in table_B on a good index, you want the order A:B, not B:A. If you are getting less than the max IOPS, you may want to do something about it.

    The correct sequence of steps is, no surprise:

    • check that the index is correct as per (1). Do not just add another index, correct the ones you have.

    • check that update stats is being executed regularly

    • always try the default operation of the optimiser first. Set stats on and measure I/Os. Use representative sets of values (that the user will use in production).

    • check the shoowplan, to ensure that the code is correct. Of course that will also identify the join order chosen.

    • if the performance is not good enough, and you believe that the the join order chosen by the optimiser for those sets of values is sub-optimal, SET JTC OFF (syntax depends on your version of DB2), then specify the order that you want in the WHERE clause. Measure I/Os. Use representative sets

    • form an opinion. Choose whichever is better performance overall. Never tune for single queries.

like image 185
PerformanceDBA Avatar answered Sep 22 '22 02:09

PerformanceDBA