Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UNION versus SELECT DISTINCT and UNION ALL Performance

Is there any difference between these two performance-wise?

-- eliminate duplicates using UNION
SELECT col1,col2,col3 FROM Table1 
UNION SELECT col1,col2,col3 FROM Table2 
UNION SELECT col1,col2,col3 FROM Table3 
UNION SELECT col1,col2,col3 FROM Table4 
UNION SELECT col1,col2,col3 FROM Table5       
UNION SELECT col1,col2,col3 FROM Table6       
UNION SELECT col1,col2,col3 FROM Table7       
UNION SELECT col1,col2,col3 FROM Table8       

-- eliminate duplicates using DISTINCT    
SELECT DISTINCT * FROM
(     
    SELECT col1,col2,col3 FROM Table1 
    UNION ALL SELECT col1,col2,col3 FROM Table2 
    UNION ALL SELECT col1,col2,col3 FROM Table3 
    UNION ALL SELECT col1,col2,col3 FROM Table4 
    UNION ALL SELECT col1,col2,col3 FROM Table5       
    UNION ALL SELECT col1,col2,col3 FROM Table6       
    UNION ALL SELECT col1,col2,col3 FROM Table7       
    UNION ALL SELECT col1,col2,col3 FROM Table8       
) x   
like image 288
Pradeep Kumar Avatar asked Feb 25 '16 12:02

Pradeep Kumar


People also ask

Which gives better performance UNION or UNION all?

UNION retrieves only distinct records from all queries or tables, whereas UNION ALL returns all the records retrieved by queries. Performance of UNION ALL is higher than UNION.

Which is faster UNION or distinct?

UNION ALL does return duplicates: this results in a faster query and could be useful for those who want to know what is in both SELECT statements. UNION DISTINCT is used in scenarios when we need unique based on the where conditions in the query.

Which is faster SELECT or SELECT distinct?

DISTINCT is used to filter unique records out of all records in the table. It removes the duplicate rows. SELECT DISTINCT will always be the same, or faster than a GROUP BY.

Does SELECT distinct affect performance?

Yes, the application needs to compare every record to the "distinct" records cache as it goes. You can improve performance by using an index, particularly on the numeric and date fields.


2 Answers

The difference between Union and Union all is that UNION ALL will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.

A UNION statement effectively does a SELECT DISTINCT on the results set.

If you select Distinct from Union All result set, Then the output will be equal to the Union result set.

Edit:

Performance on CPU cost:

Let me explain with Example:

I have two queries. one is Union another one is Union All

SET STATISTICS TIME ON
GO
 
select distinct * from (select * from dbo.user_LogTime
union all
select * from dbo.user_LogTime) X 
GO

SET STATISTICS TIME OFF

SET STATISTICS TIME ON
GO
 
select * from dbo.user_LogTime
union
select * from dbo.user_LogTime
GO

SET STATISTICS TIME OFF

I did run the both in same query window in SMSS. Lets see the Execution Plan in SMSS:

The Execution Plan

What happens is, The query with Union All and Distinct will take CPU cost more than Query with Union.

Performance on Time:

UNION ALL:

(1172 row(s) affected)

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 39 ms.

UNION:

(1172 row(s) affected)

SQL Server Execution Times:
   CPU time = 10 ms,  elapsed time = 25 ms.

So Union is much better than the Union All with Distinct in performance-wise

like image 140
Sankar Avatar answered Oct 18 '22 20:10

Sankar


Comparison between UNION DISTINCT and UNION ALL

This query is used to create an extended employee table with additional alternate IDs for downstream system. This example comes from a mySQL 8.0.20 enviornment.

For the data and query shown below, tests yielded a significant difference:

UNION ALL       8.983 sec  
UNION DISTINCT 15.344 sec

To show scale and complexity for this example, table sizes and the query code are shown below

hqsource 600K records
accountingemppos 180K 
accountingposld 200K
emp_no_accountingnumeric 20
First UNION block is approx 550K records, second approx 50K

               SELECT a.`emp_no_imported` AS `emp_no`,
              a.`supervisor_emp_no`,
              a.`first name`,
              a.`middle name`,
              a.`last name`,
              a.`jobtitle`,
              a.`status`,
              CASE WHEN rida.`accounting_emp_no` IS NOT NULL THEN
                    rida.`accounting_emp_no`
              ELSE 
                    a.`emp_no_imported`
              END AS `accounting_id`,
              CASE WHEN epfp.`emp_no` IS NOT NULL THEN
                        CASE WHEN `sridf`.`emp_no` IS NOT NULL THEN
                                `sridf`.`accounting_emp_no`
                            ELSE
                                epfp.`emp_no`
                            END
                    ELSE
                        CASE WHEN epp.`emp_no` IS NOT NULL THEN
                            CASE WHEN `srids`.`emp_no` IS NOT NULL THEN
                                `srids`.`accounting_emp_no`
                            ELSE                                        
                                epp.`emp_no`
                            END
                    ELSE
                        CASE WHEN `srida`.`emp_no` IS NOT NULL THEN
                            `srida`.`accounting_emp_no`
                        ELSE
                            a.`supervisor_emp_no`
                        END
                    END
                END AS `accounting_s_emp_no`,
                ep.`emp_no` AS `traas_emp_no`,
                epp.`emp_no` AS `traas_parent_emp_no`
       FROM `hqsource`.hq_people a
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `rida` ON `rida`.emp_no = a.`emp_no_imported`
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `srida` ON `srida`.emp_no = a.`supervisor_emp_no`

       LEFT OUTER JOIN `traas`.`accountingemppos_data_extract` ep  ON ep.`emp_no` = a.`emp_no_imported` AND ep.`End` = '2899-12-31' AND ep.`Primary` = 'Y'
       LEFT OUTER JOIN `epe`.`accountingposld_data_extract` p ON p.`RangeGID` = ep.`GID`
       LEFT OUTER JOIN `traas`.`accountingemppos_data_extract` epp  ON epp.`GID` = p.`ParentGID` AND epp.`End` = '2899-12-31' AND epp.`Primary` = 'Y'
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `rids` ON `rids`.emp_no = ep.`emp_no`
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `srids` ON `srids`.emp_no = epp.`emp_no` AND epp.`End` = '2899-12-31' AND epp.`Primary` = 'Y'

       LEFT OUTER JOIN `epe`.`accountingemppos_data_extract_filtered` epf  ON epf.`emp_no` = a.`emp_no_imported` AND epf.`End` = '2899-12-31' AND epf.`Primary` = 'Y'
       LEFT OUTER JOIN `epe`.`accountingposld_data_extract` pf ON pf.`RangeGID` = epf.`GID` 
       LEFT OUTER JOIN `epe`.`accountingemppos_data_extract_filtered` epfp  ON epfp.`GID` = pf.`ParentGID` AND epfp.`End` = '2899-12-31' AND epfp.`Primary` = 'Y'
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `ridf` ON `ridf`.emp_no = epf.`emp_no` AND epf.`End` = '2899-12-31' AND epf.`Primary` = 'Y'
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `sridf` ON `sridf`.emp_no = epfp.`emp_no` AND epfp.`End` = '2899-12-31' AND epfp.`Primary` = 'Y'
       WHERE a.`emp_no_imported` REGEXP ('^[a-z]{2}\\d{5}.$') 

UNION ALL
-- UNION DISTINCT

    SELECT a.`emp_no_imported` AS `emp_no`, a.`supervisor_emp_no` AS `s_emp_no`, u.`First_Name`, 'ƒ' AS `MI`, u.`Last_Name`, u.`Job_Title`, NULL AS `status`, 
          CASE WHEN rid.`accounting_emp_no` IS NULL THEN
            ep.`emp_no`
          ELSE 
            rid.`accounting_emp_no`
          END AS `accounting_emp_no`,
          CASE WHEN `srid`.`accounting_emp_no` IS NULL THEN
            epp.`emp_no`
          ELSE
            `srid`.`accounting_emp_no`
          END AS `accounting_s_emp_no`,
          ep.`emp_no` AS `traas_emp_no`,
          epp.`emp_no` AS `traas_parent_emp_no`
    FROM `epe`.`accountingemppos_data_extract_filtered`  ep

    LEFT OUTER JOIN  `hqsource`.`hq_people` a ON a.`emp_no_imported` = ep.`emp_no`
    LEFT OUTER JOIN `epe`.`accountingposld_data_extract` p ON p.`RangeGID` =  ep.`GID`
    LEFT OUTER JOIN `epe`.`accountingemppos_data_extract_filtered` epp ON epp.`GID` = p.`ParentGID`
    LEFT OUTER JOIN `siebel`.`users_all_output` u ON u.`LOGIN` = ep.`emp_no`
    LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `rid` ON `rid`.emp_no = ep.`emp_no`
    LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `srid` ON `srid`.emp_no = epp.`emp_no`
    WHERE 
        ep.`End` = '2899-12-31' AND
        epp.`End` = '2899-12-31' AND
        p.`End` = '2899-12-31' AND
        ep.emp_no REGEXP ('^F\\d{8}$|^V[0-3]\\d{5}$')
        
ORDER BY LENGTH(accounting_emp_no) ASC, accounting_emp_no ASC

;

The WHERE clauses in each of the two UNION blocks guarantee the results will be unique. (This query is years old and runs daily. I wish I had tried this sooner). Field names have been obfuscated

like image 2
wistlo Avatar answered Oct 18 '22 19:10

wistlo