Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FULL OUTER JOIN duplicating rows using COALESCE

It was a long day, perhaps this is a simple question but i'm stuck anyway.

Basically i have two similar tables Sales and Forecasts. I'm trying to create a view which selects rows from both tables and picks whatever is there for a given model+month+country. If both tables contain data, Sales has priority which means that Forecast rows should be omitted.

To simplify the query i'm using CTE's. Actually the schema of both tables is different and many tables are joined, also Forecasts contains history rows where only the last should be shown.

I have created a simplified schema and data to show you what i'm trying to do:

WITH Sales AS
(
    SELECT 
        ID, Model, Month, Country,
        Amount              = Count,
        [Forecast / Sales]  = 'Sales'
    FROM dbo.Sales
)
, Forecasts AS
(
    SELECT 
        ID, Model, Month, Country,
        Amount              = Count,
        [Forecast / Sales]  = 'Forecast'
    FROM dbo.Forecast
)
SELECT  ID = COALESCE(s.ID, fc.ID), 
        Model = COALESCE(s.Model, fc.Model), 
        Month = COALESCE(s.Month, fc.Month),
        Country = COALESCE(s.Country, fc.Country),
        Amount = COALESCE(s.Amount, fc.Amount),
        [Forecast / Sales] = COALESCE(s.[Forecast / Sales], fc.[Forecast / Sales])
FROM Sales s
FULL OUTER  JOIN Forecasts fc 
    ON s.Model = fc.Model
        AND s.Month = fc.Month
        AND s.Country = fc.Country
ORDER BY ID,Month,Country,Model

Here's a sql-fiddle with sample data: http://sqlfiddle.com/#!3/9081b/9/2

Result:

ID  MODEL   MONTH   COUNTRY AMOUNT  FORECAST / SALES
1   ABC December, 01 2013 00:00:00+0000 Germany 777 Sales
2   ABC January, 01 2014 00:00:00+0000  Germany 999 Sales
3   ABC February, 01 2014 00:00:00+0000 Germany 900 Sales
3   ABC February, 01 2014 00:00:00+0000 Germany 900 Sales
4   ABC January, 01 2014 00:00:00+0000  UK  600 Forecast
4   ABC February, 01 2014 00:00:00+0000 UK  444 Sales
5   ABC March, 01 2014 00:00:00+0000    UK  500 Forecast

This query returns duplicates according to the ID and the source (last column).

3   ABC February, 01 2014 00:00:00+0000 Germany 900 Sales
3   ABC February, 01 2014 00:00:00+0000 Germany 900 Sales

Apparently the Sales rows are being duplicated by multiple Forecast-rows for that model+month+country combination. How do i get only Sales rows if Sales+Forecast rows are available without duplicates and Forecast rows if there are no Sales rows?

like image 776
Tim Schmelter Avatar asked Apr 09 '14 14:04

Tim Schmelter


People also ask

Does full outer join gives duplicate rows?

From what you are saying, the 2 tables you are comparing are more or less the same, and full outer join giving you records from both tables, chances are you are going to get a lot of duplicates. So, that's the logic behind it.

Can we use coalesce in join condition?

A COALESCE function returns the first non-NULL expression from a specified list. Usually, we use COALESCE as one of the elements in the select list, however, it can be successfully used in the join conditions too.

How do you remove duplicates using joins?

Solution. Select column values in a specific order within rows to make rows with duplicate sets of values identical. Then you can use SELECT DISTINCT to remove duplicates. Alternatively, retrieve rows in such a way that near-duplicates are not even selected.

Why my inner join returns duplicate rows?

Unwanted rows in the result set may come from incomplete ON conditions. In some cases, you need to join tables by multiple columns. In these situations, if you use only one pair of columns, it results in duplicate rows.


1 Answers

The problem with your query isn't the use of COALESCE, but simply with the JOIN. There are 2 rows in the Forecast table that have the same combination of Model, Month, Country, rows with ID 2 and 3:

╔════╦═══════╦═════════════════════════╦═════════╦═══════╗
║ ID ║ Model ║          Month          ║ Country ║ Count ║
╠════╬═══════╬═════════════════════════╬═════════╬═══════╣
║  2 ║ ABC   ║ 2014-02-01 00:00:00.000 ║ Germany ║  1100 ║
║  3 ║ ABC   ║ 2014-02-01 00:00:00.000 ║ Germany ║   900 ║
╚════╩═══════╩═════════════════════════╩═════════╩═══════╝

Both of them join with the row ID 3 from the Sales table:

╔════╦═══════╦═════════════════════════╦═════════╦═══════╗
║ ID ║ Model ║          Month          ║ Country ║ Count ║
╠════╬═══════╬═════════════════════════╬═════════╬═══════╣
║  3 ║ ABC   ║ 2014-02-01 00:00:00.000 ║ Germany ║   900 ║
╚════╩═══════╩═════════════════════════╩═════════╩═══════╝

And since your query is using COALESCE(s.ID, fc.ID), then you get 2 rows with ID 3 in the results

like image 97
Lamak Avatar answered Sep 21 '22 10:09

Lamak