I am working with some course data in a Presto database. The data in the table looks like:
student_id period score completed
1 2016_Q1 3 Y
1 2016_Q3 4 Y
3 2017_Q1 4 Y
4 2018_Q1 2 N
I would like to format the data so that it looks like:
student_id 2018_Q1_score 2018_Q1_completed 2017_Q3_score
1 0 N 5
3 4 Y 4
4 2 N 2
I know that I could do this by joining to the table for each time period, but I wanted to ask here to see if any gurus had a recommendation for a more scalable solution (e.g. perhaps not having to manually create a new join for each period). Any suggestions?
Use of PIVOT / UNPIVOTYou can use the PIVOT and UNPIVOT operators in standard SQL, Hive, and Presto. The PIVOT operator transforms rows into columns. The UNPIVOT operator transforms columns into rows.
The design of the PIVOT and UNPIVOT operators sometimes leads to bugs and pitfalls in your code. The PIVOT operator's syntax doesn't let you explicitly indicate the grouping element. If you don't realize this, you can end up with undesired grouping elements.
PIVOT carries out an aggregation and merges possible multiple rows into a single row in the output. UNPIVOT doesn't reproduce the original table-valued expression result because rows have been merged. Also, null values in the input of UNPIVOT disappear in the output.
Using a T-SQL Pivot function is one of the simplest method for transposing rows into columns.
You can just use conditional aggregation:
select student_id,
max(case when period = '2018_Q1' then score else 0 end) as score_2018q1,
max(case when period = '2018_Q1' then completed then 'N' end) as completed_2018q1,
max(case when period = '2017_Q3' then score else 0 end) as score_2017q3
from t
group by student_id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With