So I was writing a query in Visual Studio 2010 (by which I mean I opened the server explorer, right clicked the server and chose New Query). The query includes the condition
A AND B AND C AND D AND E AND F AND (G OR H)
which is conjunctive normal form (CNF). When I ran the query(attached to MSSQL Server 2008), it changed the text into
A AND B AND C AND D AND E AND F AND G OR
A AND B AND C AND D AND E AND F AND H
which is disjunctive normal form (DNF).
From the little I found on-line, it seems like DNF allows SQL to run the conjunctives separately and union them at the end.
However, for something like this, with so many repeated conditions, does DNF actually provide an advantage over CNF? If it doesn't, how can I force the optimizer to take the condition as is? If it does, should I write the query in my application code in CNF form because it's shorter and neater or in DNF form because it saves time for the optimizer?
SQL Query optimization is defined as the iterative process of enhancing the performance of a query in terms of execution time, the number of disk accesses, and many more cost measuring criteria. Data is an integral part of any application.
CNF is an ∧ of ∨s, where ∨ is over variables or their negations (literals); an ∨ of literals is also called a clause. DNF is an ∨ of ∧s; an ∧ of literals is called a term.
Simply write down the truth table, which is quite simple to find, and deduce your CNF and DNF. If you want to find DNF, you have to look at all rows that ends with T. When you find those rows, take the x,y, and z values from each respective column. Thus, you get (x∧y∧z)∨(x∧¬y∧¬z)∨(¬x∧y∧¬z)∨(¬x∧¬y∧z).
I don't know about the relative advantages of DNF/CNF in this situation, or even how to force the optimizer in this fashion.
Generally speaking, you don't want to force the optimizer to take your 'perceived', 'current', optimization over the one it will generate (there are exceptions to this, but these are usually rare). This largely has to do with the fact that the 'best' optimization may change over time, as a side effect of other actions (like adding an index). If you're forcing the optimizer to adopt a particular optimization, you're locking it into that path, even if a new one may perform better.
Given that, you should write the query in the form that is easiest to read and maintain (CNF), and let the optimizer change it if necessary - this is the whole point of SQL being a declarative language, to allow the optimizer to muck with things as necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With