Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

T-SQL - Joining multiple tables results in duplicate rows

I am struggeling to get the following result from below sample Fiddle.

Result:

enter image description here

Source tables:

enter image description here

Fiddle

The idea is to create a single row for each RecordID by finding the max count of Budget and Forecast IDs to create the other columns (leaving columns cells empty if no value for that Budget or Forecast ID exists).

I tried with PIVOT function but couldn't get even near a decent result.

UPDATE: Please see below image where I tried to explain the expected output:

In words: for each BudgetID that belongs to a RecordID create separate columns for BDate, Result (Percentage * BAmount of Records table) and Status.

In my example RecordID 55 has two entries in the Budget table - therefore 2x3 columns are needed to show each date, result and status in its own column in a single row for this RecordID.

Since RecordID 77 has the most (three) entries in the Budget table it is used to create 3x3 columns over all rows.

The same goes for Forecast.

output

I hope you can help me.

Thank you.

like image 549
ratanmalko Avatar asked Jul 04 '18 13:07

ratanmalko


1 Answers

Good day,

Note! My goal is to help the reader to learn and not to provide final query as a solution for his homework. Therefore I will present the solution in two steps, and I will add several "PRINT" command in the dynamic solution, so the reader will have option to check the intermediate step in the work.

The DDL+DML provided by the OP:

DROP TABLE IF EXISTS Budget;
CREATE TABLE Budget
    (BudgetID int, RecordID int, BDate date,Percentage int, [Status] varchar(50));
INSERT INTO Budget
    (BudgetID, RecordID, BDate,Percentage,Status)
VALUES
    (1, 55, '2017-01-01', 60, 'ordered'),
    (2, 55, '2017-03-24', 40, 'ordered'),
    (3, 66, '2018-08-15', 100, 'invoiced'),
    (4, 77, '2018-12-02', 25, 'paid'),
    (5, 77, '2018-09-10', 35, 'ordered'),
    (6, 77, '2019-07-13', 40, 'ordered')
GO

DROP TABLE IF EXISTS Forecast;
CREATE TABLE Forecast
    (ForecastID int, RecordID int, FDate date, Percentage int);
INSERT INTO Forecast
    (ForecastID, RecordID, FDate,Percentage)
VALUES
    (1, 55, '2020-12-01', 100),
    (2, 77, '2023-05-17', 25),
    (3, 77, '2024-11-28', 75)
GO

DROP TABLE IF EXISTS Records;
CREATE TABLE Records
    (RecordID int, BAmount int, FAmount int, Name varchar(40), Description varchar(40) )
;
INSERT INTO Records
   (RecordID, BAmount,FAmount,Name, Description)
VALUES
    (55, 15000, 33000, 'Prod1', 'Desc1' ),
     (66, 22000, 17500, 'Prod2', 'Desc2' ),
    (77, 40000, 44000, 'Prod3', 'Desc3' )
GO

select * from Budget
select * from Forecast
select * from Records

Let's first show a simple static solution

This base on the knowledge that we have maximum of three rows for each RecordID in the Budget table and maximum of two rows for each RecordID in the Forecast table. This will help to understand the dynamic solution I will show next

;With CteBudget as (
    select 
        b.BDate, b.BudgetID, b.Percentage, b.RecordID, b.Status
        ,RN = ROW_NUMBER() OVER (partition by b.RecordID order by b.BudgetID)
    from Budget b
),
CteForecast as (
    select 
        f.FDate, f.ForecastID, f.Percentage, f.RecordID
        ,RN = ROW_NUMBER() OVER (partition by f.RecordID order by f.ForecastID)
    from Forecast f
)
select 
    r.RecordID, r.Name, r.Description, 

    b1.BDate BDate1, (b1.Percentage * r.BAmount)/100 BResult1, b1.Status BStatus1, 
    b2.BDate BDate2, (b2.Percentage * r.BAmount)/100 BResult2, b2.Status BStatus2,
    b3.BDate BDate3, (b3.Percentage * r.BAmount)/100 BResult3, b3.Status BStatus3, 

    f1.FDate FDate1, (f1.Percentage * r.BAmount)/100 FResult1,
    f2.FDate FDate2, (f2.Percentage * r.BAmount)/100 FResult2

from Records r
left join CteBudget b1 on r.RecordID = b1.RecordID and b1.RN = 1
left join CteBudget b2 on r.RecordID = b2.RecordID and b2.RN = 2
left join CteBudget b3 on r.RecordID = b3.RecordID and b3.RN = 3
left join CteForecast f1 on r.RecordID = f1.RecordID and f1.RN = 1
left join CteForecast f2 on r.RecordID = f2.RecordID and f2.RN = 2
--where r.RecordID = 77
GO

Note! For static solution and without indexes (as I will add in the end), the above solution is VERY bad regarding performance, but once we add the right index and as a base for a dynamic solution this option should fit well.

Now we can present the dynamic solution.

-- Get number of columns
Declare @NumBudget tinyint
Declare @NumForecast tinyint
SELECT @NumBudget = MAX(C) FROM (
    select COUNT(RecordID) C
    from Budget
    GROUP BY RecordID
) t
SELECT @NumForecast = MAX(C) FROM (
    select COUNT(RecordID) C
    from Forecast
    GROUP BY RecordID
) t
---------------------------------------------
DECLARE @SQLString1 nvarchar(MAX) = '';
DECLARE @SQLString2 nvarchar(MAX) = '';
DECLARE @loop int = 1;
WHILE @loop <= @NumBudget BEGIN 
    SET @SQLString1 = @SQLString1 + N'
    b' + CONVERT(VARCHAR(2),@loop) + '.BDate BDate' + CONVERT(VARCHAR(2),@loop) + ', (b' + CONVERT(VARCHAR(2),@loop) + '.Percentage * r.BAmount)/100 BResult' + CONVERT(VARCHAR(2),@loop) + ', b' + CONVERT(VARCHAR(2),@loop) + '.Status BStatus' + CONVERT(VARCHAR(2),@loop) + ', '

    SET @SQLString2 = @SQLString2 + N'
    left join CteBudget b' + CONVERT(VARCHAR(2),@loop) + ' on r.RecordID = b' + CONVERT(VARCHAR(2),@loop) + '.RecordID and b' + CONVERT(VARCHAR(2),@loop) + '.RN = 1'

    SET @loop = @loop + 1
END
SET @loop = 1

WHILE @loop <= @NumForecast BEGIN  
    SET @SQLString1 = @SQLString1 + N'
    f' + CONVERT(VARCHAR(2),@loop) + '.FDate FDate' + CONVERT(VARCHAR(2),@loop) + ', (f' + CONVERT(VARCHAR(2),@loop) + '.Percentage * r.BAmount)/100 FResult' + CONVERT(VARCHAR(2),@loop) + ','

    SET @SQLString2 = @SQLString2 + N'
    left join CteForecast f' + CONVERT(VARCHAR(2),@loop) + ' on r.RecordID = f' + CONVERT(VARCHAR(2),@loop) + '.RecordID and f' + CONVERT(VARCHAR(2),@loop) + '.RN = 1'

    SET @loop = @loop + 1
END
SET @SQLString1 = STUFF (@SQLString1, LEN(@SQLString1) , 1 , '')  
PRINT '/************************************************/'
PRINT @SQLString1
PRINT @SQLString2
PRINT '/************************************************/'

DECLARE @SQLString nvarchar(MAX);
SET @SQLString = N'
;With CteBudget as (
    select 
        b.BDate, b.BudgetID, b.Percentage, b.RecordID, b.Status
        ,RN = ROW_NUMBER() OVER (partition by b.RecordID order by b.BudgetID)
    from Budget b
),
CteForecast as (
    select 
        f.FDate, f.ForecastID, f.Percentage, f.RecordID
        ,RN = ROW_NUMBER() OVER (partition by f.RecordID order by f.ForecastID)
    from Forecast f
)
select 
    r.RecordID, r.Name, r.Description, 
'
+ @SQLString1
+ N'
from Records r'
+ @SQLString2

print @SQLString

EXECUTE sp_executesql @SQLString
GO

IMPORTANT! This solution is not necessarily the one that gives the best performance, but the one that is probably the simplest to follow and understand. On production once we will have the real DDL+DML and the server will have the statistics, then we will be able to improve the performance and chose best solution for our specific case.

Indexes

Note! The above solution might lead to a lot of sorting of the data, and having the right indexes is HIGHLY important here! It is important to test several different options and select the best.

For the sake of the forum (or whatever you call stackoverflow which in my opinion is not a discussion forum interface but Q&A interface), I add queries to create the CLUSTERED INDEX which I assume you have in the production, and an optional NONCLUSTERED INDEX which you should test (I did not tested other options and this is imply the first that came to my mind, so it is recommended to keep check for the right index with the real DDL+DML).

-- CLUSTERED INDEX
CREATE CLUSTERED INDEX IX_Budget_BudgetID
    ON dbo.Budget (BudgetID);   
GO 
CREATE CLUSTERED INDEX IX_Forecast_ForecastID
    ON dbo.Forecast (ForecastID);   
GO 
CREATE CLUSTERED INDEX IX_Records_RecordID  
    ON dbo.Records (RecordID);   
GO 

-- NONCLUSTERED INDEX
CREATE NONCLUSTERED INDEX NX_Budget_RecordID_BudgetID
    ON dbo.Budget (RecordID,BudgetID);   
GO 
CREATE NONCLUSTERED INDEX NX_Forecast_RecordID_ForecastID
    ON dbo.Forecast (RecordID,ForecastID);   
GO 
CREATE NONCLUSTERED INDEX NX_Records_RecordID_RecordID  
    ON dbo.Records (RecordID);   
GO 
like image 72
Ronen Ariely Avatar answered Sep 28 '22 19:09

Ronen Ariely