I have an interesting conundrum which I believe can be solved in purely SQL. I have tables similar to the following:
responses: user_id | question_id | body ---------------------------- 1 | 1 | Yes 2 | 1 | Yes 1 | 2 | Yes 2 | 2 | No 1 | 3 | No 2 | 3 | No questions: id | body ------------------------- 1 | Do you like apples? 2 | Do you like oranges? 3 | Do you like carrots?
and I would like to get the following output
user_id | Do you like apples? | Do you like oranges? | Do you like carrots? --------------------------------------------------------------------------- 1 | Yes | Yes | No 2 | Yes | No | No
I don't know how many questions there will be, and they will be dynamic, so I can't just code for every question. I am using PostgreSQL and I believe this is called transposition, but I can't seem to find anything that says the standard way of doing this in SQL. I remember doing this in my database class back in college, but it was in MySQL and I honestly don't remember how we did it.
I'm assuming it will be a combination of joins and a GROUP BY
statement, but I can't even figure out how to start.
Anybody know how to do this? Thanks very much!
Edit 1: I found some information about using a crosstab which seems to be what I want, but I'm having trouble making sense of it. Links to better articles would be greatly appreciated!
If you want to transpose only select row values as columns, you can add WHERE clause in your 1st select GROUP_CONCAT statement. If you want to filter rows in your final pivot table, you can add the WHERE clause in your SET statement.
Use:
SELECT r.user_id, MAX(CASE WHEN r.question_id = 1 THEN r.body ELSE NULL END) AS "Do you like apples?", MAX(CASE WHEN r.question_id = 2 THEN r.body ELSE NULL END) AS "Do you like oranges?", MAX(CASE WHEN r.question_id = 3 THEN r.body ELSE NULL END) AS "Do you like carrots?" FROM RESPONSES r JOIN QUESTIONS q ON q.id = r.question_id GROUP BY r.user_id
This is a standard pivot query, because you are "pivoting" the data from rows to columnar data.
I implemented a truly dynamic function to handle this problem without having to hard code any specific class of answers or use external modules/extensions. It also gives full control over column ordering and supports multiple key and class/attribute columns.
You can find it here: https://github.com/jumpstarter-io/colpivot
Example that solves this particular problem:
begin; create temporary table responses ( user_id integer, question_id integer, body text ) on commit drop; create temporary table questions ( id integer, body text ) on commit drop; insert into responses values (1,1,'Yes'), (2,1,'Yes'), (1,2,'Yes'), (2,2,'No'), (1,3,'No'), (2,3,'No'); insert into questions values (1, 'Do you like apples?'), (2, 'Do you like oranges?'), (3, 'Do you like carrots?'); select colpivot('_output', $$ select r.user_id, q.body q, r.body a from responses r join questions q on q.id = r.question_id $$, array['user_id'], array['q'], '#.a', null); select * from _output; rollback;
This outputs:
user_id | 'Do you like apples?' | 'Do you like carrots?' | 'Do you like oranges?' ---------+-----------------------+------------------------+------------------------ 1 | Yes | No | Yes 2 | Yes | No | No
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With