Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logic for a complex query which involves grouping and average in SQL

Tags:

sql

sql-server

I have 2 tables.
Below are the steps I need to follow to get my desired output. I could follow upto Step 3 .Kindly help me as it is a little complex and Im unable to understand how to proceed further.

Table 1

Site  Code FailFlag  Comments  ModifiedDate  ModifiedBy
ABT   A01     F      Dfasdf    10/11/2011    Anna
ABT   A01     F      dsfsdf    15/12/2012    Mand
ABT   A01            Rds       30/03/2011    Tim
ABT   A01            GHDs      02/12/2012    Andy
ABT   A02     F      dfd       09/05/2012    Anna
ABT   A02            sdada     11/02/2013    Kathy
ABT   A02            Dfg       15/05/2011    Rob
AFL   A02     F      asda      13/02/2011    Dan
AFL   A02            dsaa      24/12/2010    Ryan
TRG   A01            sdasd     16/04/2010    Richard
TRG   K05            jksdh     23/04/2012    Mark
KLD   K05     F      sd        18/05/2013    Jim
KLD   K05            dsfsd     10/03/2012    James
KLD   K05            sdsd      12/05/2011    Luther
KTY   K05     F      saq       09/09/2012    Ryan
KTY   K05            asd       04/04/2010    Kathy
KMD   C02     F      nas       29/02/2012    Rob
KMD   C02            asda      11/11/2011    Andy

Table 2 :

Site  Code   Freq     StartDate   EndDate
ABT   A01    43       01/01/2011  01/02/2012
ABT   A02    254      01/01/2011  19/02/2011
ABT   A02    109      20/02/2011  01/01/2012
ABT   A02    12       02/01/2012  01/01/2013
AFL   A02    13       01/01/2011  01/02/2012
TRG   A01    122      01/01/2011  01/02/2012
TRG   K05    61       01/01/2011  01/02/2012
KLD   KO5    33       01/01/2011  15/05/2012
KLD   K05    79       16/05/2012  01/01/2013
KTY   K05    52       01/01/2011  01/02/2012
KMD   C02    78       01/01/2011  01/02/2012
ZYT   G01    11       01/01/2011  01/02/2012
PYN   A01    15       01/01/2011  01/02/2012
DYN   F08    122      01/01/2011  01/02/2012

Steps :

  1. The combination of two columns ‘Site’ and ‘Code’ in Table 1 are looked upon the combination of the columns ‘Site’ and ‘Code’ in Tabel2.

  2. Filter the same on the ‘Failure’ column and find out the number of failures

    Below is the query and the output:

    SELECT Site,Code,COUNT(*) as [Count],
    FailFlagCount= SUM(CASE WHEN F = 'F' THEN 1 ELSE 0 END) FROM Table1 
    GROUP BY Site,Det
    
    
    Site Code    Count   FailFlagCount
    ABT   A01     4      2
    ABT   A02     3      1
    AFL   A02     2      1
    TRG   A01     1      0
    TRG   K05     1      0      
    KLD   KO5     3      1
    KTY   K05     2      1
    KMD   C02     2      1
    
  3. We check for the same combination in Table 2. i.e., the Site and Code of the step 2 output are looked into Table 2 to get its Frequency

  4. Calculations:

    a. CC % = [1-(FailCount / Count)]*100 = [1-(2/4)]*100

    b. B.P.O % = [1-(FailCount / Freq)]*100 = [1-(2/43)]*100

    c.Forecast% =

    Let us assume the current month is March.

    Calculate the Number of failures over the last 3 years and find the average, and let’s say as ‘X’.
    Calculate the failure for remaining months, ‘Y’ = (X * Remaining months/12)
    Total failure over 12 months = Current Failure + Y

    Hence Forecast% = [1-(Total failure over 12 months/ Freq)]*100.

    For our example, let us assume the value of X = 2 Hence Y = (2 * 9/12)

    Forecast % = [1-(1+1.5)]*100

  5. The above calculations would be done for all the possible combinations of Site and Code.

    Site    Code    CC  B.P.O   Forecast   StartDate    EndDate
    ABT   A01                                  01/01/2011  01/02/2012
    ABT   A02                                  01/01/2011  19/02/2011
    ABT   A02                                  20/02/2011  01/01/2012
    ABT   A02                                  02/01/2012  01/01/2013
    AFL   A02                                  01/01/2011  01/02/2012
    TRG   A01                                  01/01/2011  01/02/2012
    TRG   K05                                  01/01/2011  01/02/2012
    KLD   K05                                  01/01/2011  15/05/2012
    KLD   K05                                  16/05/2012  01/01/2013
    KTY   K05                                  01/01/2011  01/02/2012
    KMD   C02                                  01/01/2011  01/02/2012
    
  6. The grouping of the above table is done on Site type i.e. first letter of Site. Then the average of the all the calculations( cc,BPO,Forecast) are done after Grouping For example: - ‘A’ for ‘ABT’, ‘T’ for ‘TRG’. (I am assuming that we either create multiple tables for multiple types and then do a union to get the below query)
    Examples :-

    Site     Code                             CC     B.P.O    Forecast          
     A    A01  
     A    A02 
        [i.e.,Avg value of (ABT and A02) 
             and (AFL and A02)]  
     T    A01 
     T    K05 avg
        [i.e., Avg value of (KLD and K05) 
             and (KTY and K05)]  
     K    K05 
     K    C02 
    

This should be my final output. Kindly help.

Adding the scripts for the tables for making it easier to help:

Table 1 scripts:

CREATE TABLE Table1
    (
    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    Site VARCHAR(5),
    Code VARCHAR(5),
    FailFlag CHAR(1),
    Comments VARCHAR(100),
    ModifiedDate  DATETIME,
    ModifiedBy   VARCHAR(50)
    );

INSERT INTO Table1 
      (Site,  Code,  FailFlag, Comments, ModifiedDate, ModifiedBy)
SELECT 'ABT', 'A01',   'F',    'Dfasdf', '10/11/2011', 'Anna'    UNION ALL
SELECT 'ABT', 'A01',   'F',    'dsfsdf', '15/12/2012', 'Mand'    UNION ALL
SELECT 'ABT', 'A01',   NULL,   'Rds',    '30/03/2011', 'Tim'     UNION ALL
SELECT 'ABT', 'A01',   NULL,   'GHDs',   '02/12/2012', 'Andy'    UNION ALL
SELECT 'ABT', 'A02',   'F',    'dfd',    '09/05/2012', 'Anna'    UNION ALL
SELECT 'ABT', 'A02',   NULL,   'sdada',  '11/02/2013', 'Kathy'   UNION ALL
SELECT 'ABT', 'A02',   NULL,   'Dfg',    '15/05/2011', 'Rob'     UNION ALL
SELECT 'AFL', 'A02',   'F',    'asda',   '13/02/2011', 'Dan'     UNION ALL
SELECT 'AFL', 'A02',   NULL,   'dsaa',   '24/12/2010', 'Ryan'    UNION ALL
SELECT 'TRG', 'A01',   NULL,   'sdasd',  '16/04/2010', 'Richard' UNION ALL
SELECT 'TRG', 'K05',   NULL,   'jksdh',  '23/04/2012', 'Mark'    UNION ALL
SELECT 'KLD', 'K05',   'F',    'sd',     '18/05/2013', 'Jim'     UNION ALL
SELECT 'KLD', 'K05',   NULL,   'dsfsd',  '10/03/2012', 'James'   UNION ALL
SELECT 'KLD', 'K05',   NULL,   'sdsd',   '12/05/2011', 'Luther'  UNION ALL
SELECT 'KTY', 'K05',   'F',    'saq',    '09/09/2012', 'Ryan'    UNION ALL
SELECT 'KTY', 'K05',   NULL,   'asd',    '04/04/2010', 'Kathy'   UNION ALL
SELECT 'KMD', 'C02',   'F',    'nas',    '29/02/2012', 'Rob'     UNION ALL
SELECT 'KMD', 'C02',   NULL,   'asda',   '11/11/2011', 'Andy';

Table 2 scripts:

CREATE TABLE Table2
    (
    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    Site VARCHAR(5),
    Code VARCHAR(5),
    Freq  int,
    StartDate  DATETIME,
    EndDate  DATETIME
    );

INSERT INTO Table2 (Site, Code, Freq, StartDate, EndDate)
SELECT 'ABT', 'A01', 43,  '01/01/2011', '01/02/2012' UNION ALL
SELECT 'ABT', 'A02', 254, '01/01/2011', '19/02/2011' UNION ALL
SELECT 'ABT', 'A02', 109, '20/02/2011', '01/01/2012' UNION ALL
SELECT 'ABT', 'A02', 12,  '02/01/2012', '01/01/2013' UNION ALL
SELECT 'AFL', 'A02', 13,  '01/01/2011', '01/02/2012' UNION ALL
SELECT 'TRG', 'A01', 122, '01/01/2011', '01/02/2012' UNION ALL
SELECT 'TRG', 'K05', 61,  '01/01/2011', '01/02/2012' UNION ALL
SELECT 'KLD', 'KO5', 33,  '01/01/2011', '15/05/2012' UNION ALL
SELECT 'KLD', 'K05', 79,  '16/05/2012', '01/01/2013' UNION ALL
SELECT 'KTY', 'K05', 52,  '01/01/2011', '01/02/2012' UNION ALL
SELECT 'KMD', 'C02', 78,  '01/01/2011', '01/02/2012' UNION ALL
SELECT 'ZYT', 'G01', 11,  '01/01/2011', '01/02/2012' UNION ALL
SELECT 'PYN', 'A01', 15,  '01/01/2011', '01/02/2012' UNION ALL
SELECT 'DYN', 'F08', 122, '01/01/2011', '01/02/2012';
like image 505
SDR Avatar asked Jun 26 '13 16:06

SDR


People also ask

What is complex query in SQL with example?

Complex Queries in SQL ( Oracle ) These questions are the most frequently asked in interviews. To fetch ALTERNATE records from a table. ( EVEN NUMBERED) select * from emp where rowid in (select decode(mod(rownum,2),0,rowid, null) from emp);

How do you find the average of a group in SQL?

AVG() With a DISTINCT Clause To do this in SQL, you have to use a DISTINCT clause. You can see the annual_salary column in our employees table has values that repeat, for instance, 5000 and 10000. If we want SQL to consider these values only once, we use AVG() with a DISTINCT clause.

How do you write an average query in SQL?

The avg() function has the following syntax: SELECT AVG( column_name ) FROM table_name; The avg() function can be used with the SELECT query for retrieving data from a table.


1 Answers

"A little complex." Yes.

Starting from the top, here's your step 2. This could have a key flaw, in that it's not limiting anything to last 3 years or last 12 months. It's not clear from your question of that was the intent, so we'll use this.

SELECT Site,Code,COUNT(*) as [Count],
FailFlagCount= SUM(CASE WHEN FailFlag = 'F' THEN 1 ELSE 0 END)
FROM Table1 
GROUP BY Site,Code;

Need to join that with Table2 to get the frequency field. I've wrapped that in a sub-select and called that subquery S2 because it was step 2.

SELECT s2.Site, s2.Code,s2.[Count], S2.FailFlagCount, Table2.Freq
FROM
(
SELECT Site,Code,COUNT(*) as [Count],
FailFlagCount= SUM(CASE WHEN FailFlag = 'F' THEN 1 ELSE 0 END)
FROM Table1 
GROUP BY Site,Code
) S2
LEFT JOIN Table2 on S2.Site = table2.Site and S2.Code = table2.Code
;

Calculating CC and BPO are then relatively easy. The above becomes a subquery called S3.

Note that I've prevented divide by zero errors here.

SELECT Site, Code, [Count], FailFlagCount, Freq,
CASE WHEN [Count]=0 THEN 0 ELSE  ( 1-(FailFlagCount / [Count]))*100 END  CC,
CASE WHEN Freq = 0 THEN 0 ELSE ( 1-(FailFlagCount / Freq))*100 END BPO

FROM
(
SELECT s2.Site, s2.Code,s2.[Count], S2.FailFlagCount, ISNULL(Table2.Freq, 1) AS Freq
FROM   
(
SELECT Site,Code,COUNT(*) as [Count],
FailFlagCount= SUM(CASE WHEN FailFlag = 'F' THEN 1 ELSE 0 END)
FROM Table1 
GROUP BY Site,Code
) S2
LEFT JOIN Table2 on S2.Site = table2.Site and S2.Code = table2.Code
) s3
;

Ok, a bit of experimenting here, around SQL month functions, how to calculate how many months there are left, and 1st of this year.

select GETDATE(), month(GETDATE()), 12-MONTH(GETDATE()),
DATEADD(YEAR, -3, GETDATE()),
DATEFROMPARTS (YEAR(GETDATE()),1,1)
;

So now we can calculate the Y figure - this is the whole of table1 again, but filtered for the last 36 months, then calculate the monthly average. Implicit error may creep in here when you're running this in the middle of the month - it's not breaking on month borders, it's taking the last 3 years complete not the last 36 completed months excluding this month.

Note that I had to cast count(*) as a float otherwise it's an integer, which gets rounded to 0 in the division.

SELECT SITE, CODE, count(*) tot, cast(count(*) as float)/36 avg, (cast(count(*) as float)/36) * ((12-MONTH(GETDATE()))/12) Y
FROM Table1 
WHERE FailFlag = 'F'
AND ModifiedDate >= DATEADD(YEAR, -3, GETDATE())
GROUP BY SITE, CODE;

And we calculate failures this year in the same way.

SELECT SITE, CODE, COUNT(*) currFails 
FROM Table1 
WHERE FailFlag = 'F'
AND ModifiedDate >= DATEFROMPARTS (YEAR(GETDATE()),1,1)
GROUP BY SITE, CODE;

So now we can combine the above 3 steps to get a table with Y and current year fails. Note the ISNULL there with the LEFT JOIN to compensate for sites that have no failures this year.

SELECT s3.Site, s3.Code, [Count], FailFlagCount, Freq,
CASE WHEN [Count]=0 THEN 0 ELSE  ( 1-(FailFlagCount / [Count]))*100 END  CC,
CASE WHEN Freq = 0 THEN 0 ELSE ( 1-(FailFlagCount / Freq))*100 END BPO,
ISNULL(YBIt.Y, 0) Y,
isnull(currBit.currFails, 0) as currFails

FROM
(
SELECT s2.Site, s2.Code,s2.[Count], S2.FailFlagCount, ISNULL(Table2.Freq, 1) AS Freq
FROM   
(
SELECT Site,Code,COUNT(*) as [Count],
FailFlagCount= SUM(CASE WHEN FailFlag = 'F' THEN 1 ELSE 0 END)
FROM Table1 
GROUP BY Site,Code
) S2
LEFT JOIN Table2 on S2.Site = table2.Site and S2.Code = table2.Code
) s3

LEFT JOIN
(
SELECT SITE, CODE, count(*) tot, cast(count(*) as float)/36 avg, (cast(count(*) as float)/36) * ((12-MONTH(GETDATE()))/12) Y
FROM Table1 
WHERE FailFlag = 'F'
AND ModifiedDate >= DATEADD(YEAR, -3, GETDATE())
GROUP BY SITE, CODE
) YBit on S3.Site = Ybit.site AND S3.code = YBit.code 

LEFT JOIN
(
SELECT SITE, CODE, COUNT(*) currFails 
FROM Table1 
WHERE FailFlag = 'F'
AND ModifiedDate >= DATEFROMPARTS (YEAR(GETDATE()),1,1)
GROUP BY SITE, CODE
) currBit ON s3.site = currBit.site and s3.code = currBit.code;

;

So we can finally calculate forecast.

SELECT Site, code, [Count], FAilFlagCount, Freq,
   CC, BPO, Y, currFails, Y+currFails totF12,
    CASE WHEN Freq = 0 then 0 else (1-(( Y+currFails)/ Freq))*100 END Forecast
 FROM 
(
SELECT s3.Site, s3.Code, [Count], FailFlagCount, Freq,
CASE WHEN [Count]=0 THEN 0 ELSE  ( 1-(FailFlagCount / [Count]))*100 END  CC,
CASE WHEN Freq = 0 THEN 0 ELSE ( 1-(FailFlagCount / Freq))*100 END BPO,
ISNULL(YBIt.Y, 0) Y,
isnull(currBit.currFails, 0) as currFails

FROM
(
SELECT s2.Site, s2.Code,s2.[Count], S2.FailFlagCount, ISNULL(Table2.Freq, 1) AS Freq
FROM   
(
SELECT Site,Code,COUNT(*) as [Count],
FailFlagCount= SUM(CASE WHEN FailFlag = 'F' THEN 1 ELSE 0 END)
FROM Table1 
GROUP BY Site, Code
) S2
LEFT JOIN Table2 on S2.Site = table2.Site and S2.Code = table2.Code
) s3

LEFT JOIN
(
SELECT SITE, CODE, count(*) tot, cast(count(*) as float)/36 avg, (cast(count(*) as float)/36) * ((12-MONTH(GETDATE()))/12) Y
FROM Table1 
WHERE FailFlag = 'F'
AND ModifiedDate >= DATEADD(YEAR, -3, GETDATE())
GROUP BY SITE, CODE
) YBit on S3.Site = Ybit.site AND S3.code = YBit.code 

LEFT JOIN
(
SELECT SITE, CODE, COUNT(*) currFails 
FROM Table1 
WHERE FailFlag = 'F'
AND ModifiedDate >= DATEFROMPARTS (YEAR(GETDATE()),1,1)
GROUP BY SITE, CODE
) currBit ON s3.site = currBit.site and s3.code = currBit.code

) S4A
;

Ok, suddenly we need start date and end date again. They're in Table2, which also appears likely to be the master list of Site + Code. So lets make that primary and join our calculation to that with left joins. We'll also calculate the site type.

This is your detailed calculation for each site + code.

select left (t2.site, 1) siteType, t2.site, t2.code, 
  [Count], FAilFlagCount, s4.Freq,
   CC, BPO, Forecast,
    T2.StartDate, T2.EndDAte

FROM 
 Table2 T2 
LEFT JOIN
(

SELECT Site, code, [Count], FAilFlagCount, Freq,
   CC, BPO, Y, currFails, Y+currFails totF12,
    CASE WHEN Freq = 0 then 0 else (1-(( Y+currFails)/ Freq))*100 END Forecast
 FROM 
(
SELECT s3.Site, s3.Code, [Count], FailFlagCount, Freq,
CASE WHEN [Count]=0 THEN 0 ELSE  ( 1-(FailFlagCount / [Count]))*100 END  CC,
CASE WHEN Freq = 0 THEN 0 ELSE ( 1-(FailFlagCount / Freq))*100 END BPO,
ISNULL(YBIt.Y, 0) Y,
isnull(currBit.currFails, 0) as currFails

FROM
(
SELECT s2.Site, s2.Code,s2.[Count], S2.FailFlagCount, ISNULL(Table2.Freq, 1) AS Freq
FROM   
(
SELECT Site,Code,COUNT(*) as [Count],
FailFlagCount= SUM(CASE WHEN FailFlag = 'F' THEN 1 ELSE 0 END)
FROM Table1 
GROUP BY Site, Code
) S2
LEFT JOIN Table2 on S2.Site = table2.Site and S2.Code = table2.Code
) s3

LEFT JOIN
(
SELECT SITE, CODE, count(*) tot, cast(count(*) as float)/36 avg, (cast(count(*) as float)/36) * ((12-MONTH(GETDATE()))/12) Y
FROM Table1 
WHERE FailFlag = 'F'
AND ModifiedDate >= DATEADD(YEAR, -3, GETDATE())
GROUP BY SITE, CODE
) YBit on S3.Site = Ybit.site AND S3.code = YBit.code 

LEFT JOIN
(
SELECT SITE, CODE, COUNT(*) currFails 
FROM Table1 
WHERE FailFlag = 'F'
AND ModifiedDate >= DATEFROMPARTS (YEAR(GETDATE()),1,1)
GROUP BY SITE, CODE
) currBit ON s3.site = currBit.site and s3.code = currBit.code

) S4A
) S4 
on t2.site = s4.site and t2.code = s4.code
;

From there, we can easily create the averages.

SELECT siteType, '' as site, '' as code, avg([count]) [count], avg(FailFlagCount) FailFlagCount, avg(Freq) Freq, 
    avg (CC) CC, avg(BPO) BPO, avg(Forecast) Forecast, '' StartDAte, '' EndDate
FROM 
(

select left (t2.site, 1) siteType, t2.site, t2.code, 
  [Count], FAilFlagCount, s4.Freq,
   CC, BPO, Forecast,
    T2.StartDate, T2.EndDAte

FROM 
 Table2 T2 
LEFT JOIN
(

SELECT Site, code, [Count], FAilFlagCount, Freq,
   CC, BPO, Y, currFails, Y+currFails totF12,
    CASE WHEN Freq = 0 then 0 else (1-(( Y+currFails)/ Freq))*100 END Forecast
 FROM 
(
SELECT s3.Site, s3.Code, [Count], FailFlagCount, Freq,
CASE WHEN [Count]=0 THEN 0 ELSE  ( 1-(FailFlagCount / [Count]))*100 END  CC,
CASE WHEN Freq = 0 THEN 0 ELSE ( 1-(FailFlagCount / Freq))*100 END BPO,
ISNULL(YBIt.Y, 0) Y,
isnull(currBit.currFails, 0) as currFails

FROM
(
SELECT s2.Site, s2.Code,s2.[Count], S2.FailFlagCount, ISNULL(Table2.Freq, 1) AS Freq
FROM   
(
SELECT Site,Code,COUNT(*) as [Count],
FailFlagCount= SUM(CASE WHEN FailFlag = 'F' THEN 1 ELSE 0 END)
FROM Table1 
GROUP BY Site, Code
) S2
LEFT JOIN Table2 on S2.Site = table2.Site and S2.Code = table2.Code
) s3

LEFT JOIN
(
SELECT SITE, CODE, count(*) tot, cast(count(*) as float)/36 avg, (cast(count(*) as float)/36) * ((12-MONTH(GETDATE()))/12) Y
FROM Table1 
WHERE FailFlag = 'F'
AND ModifiedDate >= DATEADD(YEAR, -3, GETDATE())
GROUP BY SITE, CODE
) YBit on S3.Site = Ybit.site AND S3.code = YBit.code 

LEFT JOIN
(
SELECT SITE, CODE, COUNT(*) currFails 
FROM Table1 
WHERE FailFlag = 'F'
AND ModifiedDate >= DATEFROMPARTS (YEAR(GETDATE()),1,1)
GROUP BY SITE, CODE
) currBit ON s3.site = currBit.site and s3.code = currBit.code

) S4A
) S4 
on t2.site = s4.site and t2.code = s4.code
) S5
GROUP BY siteType
;

Or you could populate a table with the detailed calculation, then calculate the average off that instead of using my final calculation.

I'll leave the final union to you.

like image 150
GregHNZ Avatar answered Sep 28 '22 08:09

GregHNZ