I'm looking for some "inference rules" (similar to set operation rules or logic rules) which I can use to reduce a SQL query in complexity or size. Does there exist something like that? Any papers, any tools? Any equivalencies that you found on your own? It's somehow similar to query optimization, but not in terms of performance. To state it different: Having a (complex) query with JOINs, SUBSELECTs, UNIONs is it possible (or not) to reduce it to a simpler, equivalent SQL statement, which is producing the same result, by using some transformation rules? So, I'm looking for equivalent transformations of SQL statements like the fact that most SUBSELECTs can be rewritten as a JOIN.

<blockquote> To state it different: Having a (complex) query with JOINs, SUBSELECTs, UNIONs is it possible (or not) to reduce it to a simpler, equivalent SQL statement, which is producing the same result, by using some transformation rules? </blockquote> That's exactly what optimizers do for a living (not that I'm saying they always do this well). Since <code>SQL</code> is a set based language, there are usually more than one way to transform one query to other. Like this query: <pre class="prettyprint"><code>SELECT * FROM mytable WHERE col1 > @value1 OR col2 < @value2 </code></pre> can be transformed into this: <pre class="prettyprint"><code>SELECT * FROM mytable WHERE col1 > @value1 UNION SELECT * FROM mytable WHERE col2 < @value2 </code></pre> or this: <pre class="prettyprint"><code>SELECT mo.* FROM ( SELECT id FROM mytable WHERE col1 > @value1 UNION SELECT id FROM mytable WHERE col2 < @value2 ) mi JOIN mytable mo ON mo.id = mi.id </code></pre> , which look uglier but can yield better execution plans. One of the most common things to do is replacing this query: <pre class="prettyprint"><code>SELECT * FROM mytable WHERE col IN ( SELECT othercol FROM othertable ) </code></pre> with this one: <pre class="prettyprint"><code>SELECT * FROM mytable mo WHERE EXISTS ( SELECT NULL FROM othertable o WHERE o.othercol = mo.col ) </code></pre> In some <code>RDBMS</code>'s (like <code>PostgreSQL</code>), <code>DISTINCT</code> and <code>GROUP BY</code> use the different execution plans, so sometimes it's better to replace one with the other: <pre class="prettyprint"><code>SELECT mo.grouper, ( SELECT SUM(col) FROM mytable mi WHERE mi.grouper = mo.grouper ) FROM ( SELECT DISTINCT grouper FROM mytable ) mo </code></pre> vs. <pre class="prettyprint"><code>SELECT mo.grouper, SUM(col) FROM mytable GROUP BY mo.grouper </code></pre> In <code>PostgreSQL</code>, <code>DISTINCT</code> sorts and <code>GROUP BY</code> hashes. <code>MySQL</code> lacks <code>FULL OUTER JOIN</code>, so it can be rewritten as folloing: <pre class="prettyprint"><code>SELECT t1.col1, t2.col2 FROM table1 t1 LEFT OUTER JOIN table2 t2 ON t1.id = t2.id </code></pre> vs. <pre class="prettyprint"><code>SELECT t1.col1, t2.col2 FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.id UNION ALL SELECT NULL, t2.col2 FROM table1 t1 RIGHT JOIN table2 t2 ON t1.id = t2.id WHERE t1.id IS NULL </code></pre> , but see this article in my blog on how to do this more efficiently in <code>MySQL</code>: <ul> <li>Emulating <code>FULL OUTER JOIN</code> in MySQL</li> </ul> This hierarchical query in <code>Oracle</code>: <pre class="prettyprint"><code>SELECT DISTINCT(animal_id) AS animal_id FROM animal START WITH animal_id = :id CONNECT BY PRIOR animal_id IN (father, mother) ORDER BY animal_id </code></pre> can be transformed to this: <pre class="prettyprint"><code>SELECT DISTINCT(animal_id) AS animal_id FROM ( SELECT 0 AS gender, animal_id, father AS parent FROM animal UNION ALL SELECT 1, animal_id, mother FROM animal ) START WITH animal_id = :id CONNECT BY parent = PRIOR animal_id ORDER BY animal_id </code></pre> , the latter one being more performant. See this article in my blog for the execution plan details: <ul> <li>Genealogy query on both parents</li> </ul> To find all ranges that overlap the given range, you can use the following query: <pre class="prettyprint"><code>SELECT * FROM ranges WHERE end_date >= @start AND start_date <= @end </code></pre> , but in <code>SQL Server</code> this more complex query yields same results faster: <pre class="prettyprint"><code>SELECT * FROM ranges WHERE (start_date > @start AND start_date <= @end) OR (@start BETWEEN start_date AND end_date) </code></pre> , and believe it or not, I have an article in my blog on this too: <ul> <li>Overlapping ranges: SQL Server</li> </ul> <code>SQL Server</code> also lacks an efficient way to do cumulative aggregates, so this query: <pre class="prettyprint"><code>SELECT mi.id, SUM(mo.value) AS running_sum FROM mytable mi JOIN mytable mo ON mo.id <= mi.id GROUP BY mi.id </code></pre> can be more efficiently rewritten using, Lord help me, cursors (you heard me right: <code>cursors</code>, <code>more efficiently</code> and <code>SQL Server</code> in one sentence). See this article in my blog on how to do it: <ul> <li>Flattening timespans: SQL Server</li> </ul> There is a certain kind of query commonly met in financial applications that searches for the effective rate for a currency, like this one in <code>Oracle</code>: <pre class="prettyprint"><code>SELECT TO_CHAR(SUM(xac_amount * rte_rate), 'FM999G999G999G999G999G999D999999') FROM t_transaction x JOIN t_rate r ON (rte_currency, rte_date) IN ( SELECT xac_currency, MAX(rte_date) FROM t_rate WHERE rte_currency = xac_currency AND rte_date <= xac_date ) </code></pre> This query can be heavily rewritten to use an equality condition which allows a <code>HASH JOIN</code> instead of <code>NESTED LOOPS</code>: <pre class="prettyprint"><code>WITH v_rate AS ( SELECT cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate FROM ( SELECT cur_id, dte_date, ( SELECT MAX(rte_date) FROM t_rate ri WHERE rte_currency = cur_id AND rte_date <= dte_date ) AS rte_effdate FROM ( SELECT ( SELECT MAX(rte_date) FROM t_rate ) - level + 1 AS dte_date FROM dual CONNECT BY level <= ( SELECT MAX(rte_date) - MIN(rte_date) FROM t_rate ) ) v_date, ( SELECT 1 AS cur_id FROM dual UNION ALL SELECT 2 AS cur_id FROM dual ) v_currency ) v_eff LEFT JOIN t_rate ON rte_currency = cur_id AND rte_date = rte_effdate ) SELECT TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999') FROM ( SELECT xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt FROM t_transaction x GROUP BY xac_currency, TRUNC(xac_date) ) JOIN v_rate ON eff_currency = xac_currency AND eff_date = xac_date </code></pre> Despite being bulky as a hell, the latter query is <code>6</code> times faster. The main idea here is replacing <code><=</code> with <code>=</code>, which requires building an in-memory calendar table. to <code>JOIN</code> with. <ul> <li>Converting currencies</li> </ul>

General rules for simplifying SQL statements

Tags:

complexity-theory

sql

logic

reduction

I'm looking for some "inference rules" (similar to set operation rules or logic rules) which I can use to reduce a SQL query in complexity or size. Does there exist something like that? Any papers, any tools? Any equivalencies that you found on your own? It's somehow similar to query optimization, but not in terms of performance.

To state it different: Having a (complex) query with JOINs, SUBSELECTs, UNIONs is it possible (or not) to reduce it to a simpler, equivalent SQL statement, which is producing the same result, by using some transformation rules?

So, I'm looking for equivalent transformations of SQL statements like the fact that most SUBSELECTs can be rewritten as a JOIN.

542

asked Jul 01 '09 14:07

MicSim

1 Answers

To state it different: Having a (complex) query with JOINs, SUBSELECTs, UNIONs is it possible (or not) to reduce it to a simpler, equivalent SQL statement, which is producing the same result, by using some transformation rules?

That's exactly what optimizers do for a living (not that I'm saying they always do this well).

Since SQL is a set based language, there are usually more than one way to transform one query to other.

Like this query:

SELECT  * FROM    mytable WHERE   col1 > @value1 OR col2 < @value2

can be transformed into this:

SELECT  * FROM    mytable WHERE   col1 > @value1 UNION SELECT  * FROM    mytable WHERE   col2 < @value2

or this:

SELECT  mo.* FROM    (         SELECT  id         FROM    mytable         WHERE   col1 > @value1         UNION         SELECT  id         FROM    mytable         WHERE   col2 < @value2         ) mi JOIN    mytable mo ON      mo.id = mi.id

, which look uglier but can yield better execution plans.

One of the most common things to do is replacing this query:

SELECT  * FROM    mytable WHERE   col IN         (         SELECT  othercol         FROM    othertable         )

with this one:

SELECT  * FROM    mytable mo WHERE   EXISTS         (         SELECT  NULL         FROM    othertable o         WHERE   o.othercol = mo.col         )

In some RDBMS's (like PostgreSQL), DISTINCT and GROUP BY use the different execution plans, so sometimes it's better to replace one with the other:

SELECT  mo.grouper,         (         SELECT  SUM(col)         FROM    mytable mi         WHERE   mi.grouper = mo.grouper         ) FROM    (         SELECT  DISTINCT grouper         FROM    mytable         ) mo

vs.

SELECT  mo.grouper, SUM(col) FROM    mytable GROUP BY         mo.grouper

In PostgreSQL, DISTINCT sorts and GROUP BY hashes.

MySQL lacks FULL OUTER JOIN, so it can be rewritten as folloing:

SELECT  t1.col1, t2.col2 FROM    table1 t1 LEFT OUTER JOIN         table2 t2 ON      t1.id = t2.id

vs.

SELECT  t1.col1, t2.col2 FROM    table1 t1 LEFT JOIN         table2 t2 ON      t1.id = t2.id UNION ALL SELECT  NULL, t2.col2 FROM    table1 t1 RIGHT JOIN         table2 t2 ON      t1.id = t2.id WHERE   t1.id IS NULL

, but see this article in my blog on how to do this more efficiently in MySQL:

Emulating FULL OUTER JOIN in MySQL

This hierarchical query in Oracle:

SELECT  DISTINCT(animal_id) AS animal_id FROM    animal START WITH         animal_id = :id CONNECT BY         PRIOR animal_id IN (father, mother) ORDER BY         animal_id

can be transformed to this:

SELECT  DISTINCT(animal_id) AS animal_id FROM    (         SELECT  0 AS gender, animal_id, father AS parent         FROM    animal         UNION ALL         SELECT  1, animal_id, mother         FROM    animal         ) START WITH         animal_id = :id CONNECT BY         parent = PRIOR animal_id ORDER BY         animal_id

, the latter one being more performant.

See this article in my blog for the execution plan details:

Genealogy query on both parents

To find all ranges that overlap the given range, you can use the following query:

SELECT  * FROM    ranges WHERE   end_date >= @start         AND start_date <= @end

, but in SQL Server this more complex query yields same results faster:

SELECT  * FROM    ranges WHERE   (start_date > @start AND start_date <= @end)         OR (@start BETWEEN start_date AND end_date)

, and believe it or not, I have an article in my blog on this too:

Overlapping ranges: SQL Server

SQL Server also lacks an efficient way to do cumulative aggregates, so this query:

SELECT  mi.id, SUM(mo.value) AS running_sum FROM    mytable mi JOIN    mytable mo ON      mo.id <= mi.id GROUP BY         mi.id

can be more efficiently rewritten using, Lord help me, cursors (you heard me right: cursors, more efficiently and SQL Server in one sentence).

See this article in my blog on how to do it:

Flattening timespans: SQL Server

There is a certain kind of query commonly met in financial applications that searches for the effective rate for a currency, like this one in Oracle:

SELECT  TO_CHAR(SUM(xac_amount * rte_rate), 'FM999G999G999G999G999G999D999999') FROM    t_transaction x JOIN    t_rate r ON      (rte_currency, rte_date) IN         (         SELECT  xac_currency, MAX(rte_date)         FROM    t_rate         WHERE   rte_currency = xac_currency                 AND rte_date <= xac_date         )

This query can be heavily rewritten to use an equality condition which allows a HASH JOIN instead of NESTED LOOPS:

WITH v_rate AS         (         SELECT  cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate         FROM    (                 SELECT  cur_id, dte_date,                         (                         SELECT  MAX(rte_date)                         FROM    t_rate ri                         WHERE   rte_currency = cur_id                                 AND rte_date <= dte_date                         ) AS rte_effdate                 FROM    (                         SELECT  (                                 SELECT  MAX(rte_date)                                 FROM    t_rate                                 ) - level + 1 AS dte_date                         FROM    dual                         CONNECT BY                                 level <=                                 (                                 SELECT  MAX(rte_date) - MIN(rte_date)                                 FROM    t_rate                                 )                         ) v_date,                         (                         SELECT  1 AS cur_id                         FROM    dual                         UNION ALL                         SELECT  2 AS cur_id                         FROM    dual                         ) v_currency                 ) v_eff         LEFT JOIN                 t_rate         ON      rte_currency = cur_id                 AND rte_date = rte_effdate         ) SELECT  TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999') FROM    (         SELECT  xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt         FROM    t_transaction x         GROUP BY                 xac_currency, TRUNC(xac_date)         ) JOIN    v_rate ON      eff_currency = xac_currency         AND eff_date = xac_date

Despite being bulky as a hell, the latter query is 6 times faster.

The main idea here is replacing <= with =, which requires building an in-memory calendar table. to JOIN with.

Converting currencies

156

answered Oct 16 '22 11:10

Quassnoi

Related questions
                            
                                Datetime in where clause
                            
                                How to return rows from left table not found in right table?
                            
                                MYSQL select DISTINCT values in two columns
                            
                                select a value where it doesn't exist in another table
                            
                                Execute SQL script from command line
                            
                                What is wrong with my SQL here? #1089 - Incorrect prefix key
                            
                                How do you find results that occurred in the past week?
                            
                                How can I alter this computed column in SQL Server 2008?
                            
                                SQL Server IF EXISTS THEN 1 ELSE 2
                            
                                How in H2DB get sql dump like in MySql?
                            
                                MySQL : ERROR 1215 (HY000): Cannot add foreign key constraint
                            
                                Most efficient way to get table row count
                            
                                How to Create a real one-to-one relationship in SQL Server
                            
                                How to convert Calendar to java.sql.Date in Java?
                            
                                Python: Number of rows affected by cursor.execute("SELECT ...)
                            
                                MySQL GROUP BY two columns
                            
                                Examples of SQL Injections through addslashes()?
                            
                                How different is PostgreSQL to MySQL? [closed]
                            
                                Why does VARCHAR need length specification?
                            
                                Foreign key reference to table in another schema

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With