Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Measuring the complexity of SQL statements

The complexity of methods in most programming languages can be measured in cyclomatic complexity with static source code analyzers. Is there a similar metric for measuring the complexity of a SQL query?

It is simple enough to measure the time it takes a query to return, but what if I just want to be able to quantify how complicated a query is?

[Edit/Note] While getting the execution plan is useful, that is not necessarily what I am trying to identify in this case. I am not looking for how difficult it is for the server to execute the query, I am looking for a metric that identifies how difficult it was for the developer to write the query, and how likely it is to contain a defect.

[Edit/Note 2] Admittedly, there are times when measuring complexity is not useful, but there are also times when it is. For a further discussion on that topic, see this question.

like image 688
epotter Avatar asked Jul 28 '10 14:07

epotter


People also ask

How do I find the complexity of a SQL query?

If both tables that are sorted according to the keys that are being used in the join, then the query will have a time complexity of O(M+N). If both tables have an index on the joined columns, then the index already maintains those columns in order and there's no need to sort. The complexity will be O(M + N).

What is complexity SQL?

Complex SQL is the use of SQL queries which go beyond the standard SQL of using the SELECT and WHERE commands. Complex SQL often involves using complex joins and sub-queries, where queries are nested in WHERE clauses. Complex queries frequently involve heavy use of AND and OR clauses.

Which SQL clause used to reduce the complexity of the queries which are lengthy?

1. Eliminate OR Clauses When Possible. The easiest way to make queries less complex is by eliminating OR clauses whenever possible. Because OR is inclusive, SQL Server has to process each component of the clause separately, which really slows down operations.

Is it possible to measure the complexity of a SQL query?

The complexity of methods in most programming languages can be measured in cyclomatic complexity with static source code analyzers. Is there a similar metric for measuring the complexity of a SQL query? It is simple enough to measure the time it takes a query to return, but what if I just want to be able to quantify how complicated a query is?

What are the control flow and computational complexity of SQL query?

The "control flow" in a SQL query is best related to "and" and "or" operators in query. The "computational complexity" is best related to operators such as SUM or implicit JOINS.

How to understand a SQL query?

To understand what our SQL query does, we need to identify the tables that provide data to each SELECT clause, and also to deduct the relations among them, mentally building a sort of data model implicitly used by the query. To do so, we just need to look at the FROM clauses of all the possible layers and their sub-queries.

What is the use of Top 50 percent in SQL?

TOP 50 PERCENT splits the data in column DataValue in 2 sets, each with half of the data, and takes the top half. Using it in combination with the ORDER BY clause, allows us to consider the top and bottom 50% of the data values. WITH ..


1 Answers

Common measures of software complexity include Cyclomatic Complexity (a measure of how complicated the control flow is) and Halstead complexity (a measure of complex the arithmetic is).

The "control flow" in a SQL query is best related to "and" and "or" operators in query.

The "computational complexity" is best related to operators such as SUM or implicit JOINS.

Once you've decided how to categorize each unit of syntax of a SQL query as to whether it is "control flow" or "computation", you can straightforwardly compute Cyclomatic or Halstead measures.

What the SQL optimizer does to queries I think is absolutely irrelevant. The purpose of complexity measures is to characterize how hard is to for a person to understand the query, not how how efficiently it can be evaluated.

Similarly, what the DDL says or whether views are involved or not shouldn't be included in such complexity measures. The assumption behind these metrics is that the complexity of machinery inside a used-abstraction isn't interesting when you simply invoke it, because presumably that abstraction does something well understood by the coder. This is why Halstead and Cyclomatic measures don't include called subroutines in their counting, and I think you can make a good case that views and DDL information are those "invoked" abstractractions.

Finally, how perfectly right or how perfectly wrong these complexity numbers are doesn't matter much, as long they reflect some truth about complexity and you can compare them relative to one another. That way you can choose which SQL fragments are the most complex, thus sort them all, and focus your testing attention on the most complicated ones.

like image 150
Ira Baxter Avatar answered Sep 22 '22 13:09

Ira Baxter