BACKGROUND:
I have data that looks like this
date src subsrc subsubsrc param1 param2
2020-02-01 src1 ksjd dfd8 47 31
2020-02-02 src1 djsk zmnc 44 95
2020-02-03 src2 skdj awes 92 100
2020-02-04 src2 mxsf kajs 80 2
2020-02-05 src3 skdj asio 46 53
2020-02-06 src3 dekl jdqo 19 18
2020-02-07 src3 dskl dqqq 69 18
2020-02-08 src4 sqip riow 64 46
2020-02-09 src5 ss01 qwep 34 34
I am trying to aggregate for all time, last 30 days and last 90 days (no rolling sum)
So my final data would look like this:
src subsrc subsubsrc p1_all p1_30 p1_90 p2_all p2_30 p2_90
src1 ksjd dfd8 7 1 7 98 7 98
src1 djsk zmnc 0 0 0 0 0 0
src2 skdj awes 12 12 12 4 4 4
src2 mxsf kajs 6 6 6 31 31 31
src3 skdj asio 0 0 0 0 0 0
src3 dekl jdqo 20 20 20 17 17 17
src3 dskl dqqq 3 3 3 4 4 4
src4 sqip qwep 0 0 0 0 0 0
src5 ss01 qwes 15 15 15 2 2 2
ABOUT DATA:
WHAT I HAVE TRIED:
This is what I have come up with:
SELECT src, subsubsrc, subsubsrc,
SUM(param1) as param1_all,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 30 THEN param1 END) as param1_30,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 90 THEN param1 END) as param1_90,
SUM(param2) as param2_all,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 30 THEN param2 END) as param2_30,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 90 THEN param2 END) as param2_90,
FROM `MY_TABLE`
GROUP BY src
ORDER BY src
This actually works but I can anticipate how long this query is going to become for multiple sources and even more param columns.
I have been trying something called "Filtered aggregate functions (or manual pivot)" explained HERE. But I am unable to understand/implement it for my case.
Also I have looked at dozens of answers and most of them are running sums for each day OR are complicated cases of this basic calculation. Maybe I am not searching it correctly.
As you can see I am newbie in SQL and would really appreciate any help.
Your query looks quite good; conditional aggregation is the canonical method to pivot a dataset.
One way to possibly increase performance would be to change the date filter in the conditional expressions: using a date function precludes the use of an index.
Instead, you could phrase this as:
select
src,
subsrc,
subsubsrc,
sum(param1) as param1_all,
sum(case when date >= current_date - interval 30 day then param1 end) as param1_30,
sum(case when date >= current_date - interval 90 day then param1 end) as param1_90,
sum(param2) as param2_all,
sum(case when date >= current_date - interval 30 day then param2 end) as param2_30,
sum(case when date >= current_date - interval 90 day then param2 end) as param2_90
from my_table
group by src, subsrc, subsubsrc
order by src, subsrc, subsubsrc
For performance, the following index may be helpul: (src, subsrc, subsubsrc, date)
.
Note that I included all three non-aggregated columns (src, subsrc, subsubsrc
) in the group by
clause: starting MySQL 5.7, this is by default mandatory (although you can play around with sql modes to alter that behavior) - and most other databases implement the same constraint.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With