I am trying to construct a single SQL statement that returns unique, non-null values from multiple columns all located in the same table. <pre class="prettyprint"><code> SELECT distinct tbl_data.code_1 FROM tbl_data WHERE tbl_data.code_1 is not null UNION SELECT tbl_data.code_2 FROM tbl_data WHERE tbl_data.code_2 is not null; </code></pre> For example, tbl_data is as follows: <pre class="prettyprint"><code> id code_1 code_2 --- -------- ---------- 1 AB BC 2 BC 3 DE EF 4 BC </code></pre> For the above table, the SQL query should return all unique non-null values from the two columns, namely: AB, BC, DE, EF. I'm fairly new to SQL. My statement above works, but is there a cleaner way to write this SQL statement, since the columns are from the same table?

It's better to include code in your question, rather than ambiguous text data, so that we are all working with the same data. Here is the sample schema and data I have assumed: <pre class="prettyprint"><code>CREATE TABLE tbl_data ( id INT NOT NULL, code_1 CHAR(2), code_2 CHAR(2) ); INSERT INTO tbl_data ( id, code_1, code_2 ) VALUES (1, 'AB', 'BC'), (2, 'BC', NULL), (3, 'DE', 'EF'), (4, NULL, 'BC'); </code></pre> As Blorgbeard commented, the <code>DISTINCT</code> clause in your solution is unnecessary because the <code>UNION</code> operator eliminates duplicate rows. There is a <code>UNION ALL</code> operator that does not elimiate duplicates, but it is not appropriate here. Rewriting your query without the <code>DISTINCT</code> clause is a fine solution to this problem: <pre class="prettyprint"><code>SELECT code_1 FROM tbl_data WHERE code_1 IS NOT NULL UNION SELECT code_2 FROM tbl_data WHERE code_2 IS NOT NULL; </code></pre> It doesn't matter that the two columns are in the same table. The solution would be the same even if the columns were in different tables. If you don't like the redundancy of specifying the same filter clause twice, you can encapsulate the union query in a virtual table before filtering that: <pre class="prettyprint"><code>SELECT code FROM ( SELECT code_1 FROM tbl_data UNION SELECT code_2 FROM tbl_data ) AS DistinctCodes (code) WHERE code IS NOT NULL; </code></pre> I find the syntax of the second more ugly, but it is logically neater. But which one performs better? I created a sqlfiddle that demonstrates that the query optimizer of SQL Server 2005 produces the same execution plan for the two different queries: <img src="https://i.stack.imgur.com/Yf0Nu.png" alt="The query optimizer produces this execution plan for both queries: two table scans, a concatenation, a distinct sort, and a select."> If SQL Server generates the same execution plan for two queries, then they are practically as well as logically equivalent. Compare the above to the execution plan for the query in your question: <img src="https://i.stack.imgur.com/CgT5e.png" alt="The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation."> The <code>DISTINCT</code> clause makes SQL Server 2005 perform a redundant sort operation, because the query optimizer does not know that any duplicates filtered out by the <code>DISTINCT</code> in the first query would be filtered out by the <code>UNION</code> later anyway. This query is logically equivalent to the other two, but the redundant operation makes it less efficient. On a large data set, I would expect your query to take longer to return a result set than the two here. Don't take my word for it; experiment in your own environment to be sure!

Select distinct values from multiple columns in same table

Tags:

sql

union

distinct

I am trying to construct a single SQL statement that returns unique, non-null values from multiple columns all located in the same table.

 SELECT distinct tbl_data.code_1 FROM tbl_data       WHERE tbl_data.code_1 is not null  UNION  SELECT tbl_data.code_2 FROM tbl_data       WHERE tbl_data.code_2 is not null;

For example, tbl_data is as follows:

 id   code_1    code_2  ---  --------  ----------  1    AB        BC  2    BC          3    DE        EF  4              BC

For the above table, the SQL query should return all unique non-null values from the two columns, namely: AB, BC, DE, EF.

I'm fairly new to SQL. My statement above works, but is there a cleaner way to write this SQL statement, since the columns are from the same table?

208

asked Jul 02 '12 23:07

regulus

1 Answers

It's better to include code in your question, rather than ambiguous text data, so that we are all working with the same data. Here is the sample schema and data I have assumed:

CREATE TABLE tbl_data (   id INT NOT NULL,   code_1 CHAR(2),   code_2 CHAR(2) );  INSERT INTO tbl_data (   id,   code_1,   code_2 ) VALUES   (1, 'AB', 'BC'),   (2, 'BC', NULL),   (3, 'DE', 'EF'),   (4, NULL, 'BC');

As Blorgbeard commented, the DISTINCT clause in your solution is unnecessary because the UNION operator eliminates duplicate rows. There is a UNION ALL operator that does not elimiate duplicates, but it is not appropriate here.

Rewriting your query without the DISTINCT clause is a fine solution to this problem:

SELECT code_1 FROM tbl_data WHERE code_1 IS NOT NULL UNION SELECT code_2 FROM tbl_data WHERE code_2 IS NOT NULL;

It doesn't matter that the two columns are in the same table. The solution would be the same even if the columns were in different tables.

If you don't like the redundancy of specifying the same filter clause twice, you can encapsulate the union query in a virtual table before filtering that:

SELECT code FROM (   SELECT code_1   FROM tbl_data   UNION   SELECT code_2   FROM tbl_data ) AS DistinctCodes (code) WHERE code IS NOT NULL;

I find the syntax of the second more ugly, but it is logically neater. But which one performs better?

I created a sqlfiddle that demonstrates that the query optimizer of SQL Server 2005 produces the same execution plan for the two different queries:

The query optimizer produces this execution plan for both queries: two table scans, a concatenation, a distinct sort, and a select.

If SQL Server generates the same execution plan for two queries, then they are practically as well as logically equivalent.

Compare the above to the execution plan for the query in your question:

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation.

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation, because the query optimizer does not know that any duplicates filtered out by the DISTINCT in the first query would be filtered out by the UNION later anyway.

This query is logically equivalent to the other two, but the redundant operation makes it less efficient. On a large data set, I would expect your query to take longer to return a result set than the two here. Don't take my word for it; experiment in your own environment to be sure!

169

answered Sep 22 '22 12:09

Iain Samuel McLean Elder

Related questions
                            
                                how to drop partition without dropping data in MySQL?
                            
                                Hidden features of PL/SQL [closed]
                            
                                Comma separated values in MySQL "IN" clause
                            
                                SQLCLR database name adds _1 every time I make a change to the project
                            
                                How to create Sql Synonym or "Alias" for Database Name?
                            
                                How to profile MySQL
                            
                                Can LINQ to SQL query an XML field DB-serverside?
                            
                                Efficiently mapping one-to-many many-to-many database to struct in Golang
                            
                                T-SQL User defined function overloading?
                            
                                Does SQLite support replication?
                            
                                Is there a simpler way to achieve this style of user messaging?
                            
                                Index spanning multiple tables in PostgreSQL
                            
                                Pass select result as parameter of stored procedure
                            
                                Is there a way to ensure WHERE clause happens after DISTINCT?
                            
                                Problems getting LEFT OUTER JOIN to work
                            
                                Force Oracle to return TOP N rows with SKIP LOCKED
                            
                                Difference between inner join and where in select join SQL statement [duplicate]
                            
                                Column aliasing in SELECT statements doesn't work with SQuirrel SQL + Firebird
                            
                                Call a Stored procedure in SQL CTE
                            
                                Left Outer join and an additional where clause

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With