Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicating rows from union statement

OK - I have looked and looked and found a lot of examples but nothing quite meeting my need. Maybe I used the wrong words to search with, but I could use your help. I will provide as much detail as I can.

I need to produce a report that merges fields from two tables, or rather a view and a table, into one table for a report. Here is the statement I am using:

SELECT A.ConfInt, A.Conference, 
        NULL as Ordered,
        NULL as Approved,
        NULL as PickedUp,
        SUM(dbo.Case_Visit_Payments.Qty) AS Qty
FROM         dbo.Conferences as A INNER JOIN
                      dbo.Case_Table ON A.ConfInt = dbo.Case_Table.Conference_ID INNER JOIN
                      dbo.Case_Visit_Payments ON dbo.Case_Table.Case_ID = dbo.Case_Visit_Payments.Case_ID
WHERE     (dbo.Case_Visit_Payments.Item_ID = 15 AND A.ProjectCool = 1)
GROUP BY A.Conference, A.ConfInt
UNION
SELECT  B.ConfInt, 
        B.Conference, 
        SUM(dbo.Cool_Fan_Order.NumberOfFansRequested) AS Ordered, 
        SUM(dbo.Cool_Fan_Order.Qty_Fans_Approved) AS Approved, 
        SUM(dbo.Cool_Fan_Order.Qty_Fans_PickedUp) AS PickedUp, 
        NULL AS Qty
FROM         dbo.Conferences as B LEFT OUTER JOIN
                      dbo.Cool_Fan_Order ON B.ConfInt = dbo.Cool_Fan_Order.Conference_ID
where B.ProjectCool = 1
GROUP BY B.Conference, B.ConfInt

And here are the results:

4   Our Lady        NULL    NULL    NULL    11
4   Our Lady        40      40      40      NULL
7   Holy Rosary     20      20      20      NULL
11  Little Flower   NULL    NULL    NULL    21
11  Little Flower   5       5       20      NULL
19  Perpetual Help  NULL    NULL    NULL    2
19  Perpetual Help  20      20      20      NULL

What I would strongly prefer is to not have the duplicating rows, such as:

4   Our Lady        40      40      40      11
7   Holy Rosary     20      20      20      NULL
11  Little Flower   5       5       20      21
19  Perpetual Help  20      20      20      2

I hope this question was clear enough. Any Suggestions would be greatly appreciated. And I do mark as answered. :)

Gregory

like image 429
Gregory Hampton Avatar asked Jun 22 '12 15:06

Gregory Hampton


People also ask

How do you remove duplicates in UNION?

This SQL UNION ALL example would return the supplier_id multiple times in the result set if that same value appeared in both the suppliers and orders table. The SQL UNION ALL operator does not remove duplicates. If you wish to remove duplicates, try using the UNION operator.

How do I remove duplicate rows from a query?

The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.

Does UNION all eliminate duplicates?

The UNION operator removes eliminate duplicate rows, whereas the UNION ALL operator does not. Because the UNION ALL operator does not remove duplicate rows, it runs faster than the UNION operator. The following are rules to union data: The number of columns in all queries must be the same.

What does UNION do with duplicates?

UNION ALL keeps all of the records from each of the original data sets, UNION removes any duplicate records. UNION first performs a sorting operation and eliminates of the records that are duplicated across all columns before finally returning the combined data set.


2 Answers

you could use your actual query as a subQuery, use an aggregate function (MAX OR SUM) on your non-duplicated values and group by the non aggregated columns

SELECT ConfInt, Conference, MAX(Ordered), MAX(Approved), MAX(PickedUp), MAX(Qty)
FROM (<your actualQuery>)
GROUP BY ConfInt, Conference.
like image 138
Raphaël Althaus Avatar answered Oct 06 '22 00:10

Raphaël Althaus


The quick answer is to wrap your query inside another one,

SELECT ConfInt
     , Conference
     , SUM(Ordered) AS Ordered
     , SUM(Approved) As Approved
     , SUM(PickedUp) AS PickedUp
     , SUM(Qty) AS Qty
  FROM (

       <your UNION query here>  

      )
 GROUP BY ConfInt, Conference

This is not the only way to achieve the result set, but its the quickest fix to meet the specified requirements.

As an alternative, I believe these queries will return equivalent results:

We could use a correlated subquery in the SELECT list to get Qty:

;WITH q AS
      ( SELECT B.ConfInt
             , B.Conference
             , SUM(o.NumberOfFansRequested) AS Ordered
             , SUM(o.Qty_Fans_Approved) AS Approved
             , SUM(o.Qty_Fans_PickedUp) AS PickedUp
          FROM dbo.Conferences as B
          LEFT
          JOIN dbo.Cool_Fan_Order o ON o.Conference_ID = B.ConfInt
         WHERE B.ProjectCool = 1
         GROUP BY B.ConfInt, B.Conference
      )
 SELECT q.ConfInt
      , q.Conference
      , q.Ordered
      , q.Approved
      , q.PickedUp
      , ( SELECT SUM(v.Qty)
            FROM dbo.Case_Table t
            JOIN dbo.Case_Visit_Payments v ON v.Case_ID = t.Case_ID
           WHERE t.Conference_ID = q.ConfInt
             AND v.Item_ID = 15
        ) AS Qty
   FROM q
  ORDER BY q.ConfInt, q.Conference

Or, we could use LEFT JOIN operation on the two queries, rather than UNION. (We know that the query referencing Cool_Fan_Order can be the LEFT side of the outer join, because we know that it returns at least as many rows as the other query. (Basically, we know that the other query can't return values of ConfInt and Conference that aren't in the Cool_Fan_Order query.)

;WITH p AS 
      ( SELECT A.ConfInt
             , A.Conference
             , SUM(v.Qty) AS Qty
          FROM dbo.Conferences as A
          JOIN dbo.Case_Table t ON t.Conference_ID = A.ConfInt
          JOIN dbo.Case_Visit_Payments v ON v.Case_ID = t.Case_ID
         WHERE A.ProjectCool = 1
           AND v.Item_ID = 15
         GROUP BY A.ConfInt, A.Conference
      )
    , q AS
      ( SELECT B.ConfInt
             , B.Conference
             , SUM(o.NumberOfFansRequested) AS Ordered
             , SUM(o.Qty_Fans_Approved) AS Approved
             , SUM(o.Qty_Fans_PickedUp) AS PickedUp
          FROM dbo.Conferences as B
          LEFT
          JOIN dbo.Cool_Fan_Order o ON B.ConfInt = o.Conference_ID
         WHERE B.ProjectCool = 1
         GROUP BY B.ConfInt, B.Conference
      )
 SELECT q.ConfInt
      , q.Conference
      , q.Ordered
      , q.Approved
      , q.PickedUp
      , p.Qty
   FROM q
   LEFT
   JOIN p ON p.ConfInt = q.ConfInt AND p.Conference = q.Conference
  ORDER BY q.ConfInt, q.Conference

The choice between those three (they all return an equivalent resultset under all conditons), boils down to readability and maintainability, and performance. On large enough rowsets, there may be some observable performance differences between the three statements.

like image 40
spencer7593 Avatar answered Oct 06 '22 01:10

spencer7593