Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Formatting a pivot result by merging rows in Sql

The background:

Think of an application that lets people make surveys with custom questions, In a particular case, interview families, An interviewer goes to House 1 and interviews two members Member 1 and Member 2. He asks questions like. What is this house address?,What is your name and age?. The answers for that is common for the Members and the answers that are specific for them are stored in the same table

After doing some Joining on some tables and pivoting the result I end up getting the following table structure.

What was achieved so far

| ID | ADDRESS |   MEMBER |    AGE | SubformIteration  |
|----|---------|----------|--------|-------------------|
|  1 | HOUSE 1 |   (null) | (null) |    (null)         |
|  1 |  (null) | MEMBER h |     18 |         s0        |
|  1 |  (null) | MEMBER i |     19 |         s1        |
|  2 | HOUSE 2 |   (null) | (null) |    (null)         |
|  2 |  (null) | MEMBER x |     36 |         s0        |
|  2 |  (null) | MEMBER y |     35 |         s1        |
|  3 | HOUSE 3 |   (null) | (null) |    (null)         |
|  3 |  (null) | MEMBER a |     18 |         s0        |
|  3 |  (null) | MEMBER b |     19 |         s1        |

I am trying to find a way to get the table to be formatted as below:

Desired output

| ID | ADDRESS |   MEMBER |    AGE | SubformIteration  |
|----|---------|----------|--------|-------------------|
|  1 | HOUSE 1 | MEMBER 1 |     18 |        s0         |
|  1 | HOUSE 1 | MEMBER 2 |     19 |        s1         |
|  2 | HOUSE 2 | MEMBER x |     36 |        s0         |
|  2 | HOUSE 2 | MEMBER y |     35 |        s1         |
|  3 | HOUSE 3 | MEMBER a |     18 |        s0         |
|  3 | HOUSE 3 | MEMBER b |     19 |        s1         |

I do not have enough sql vocabulary to describe and search the operation/procedure required to so As I am new to SQL and I would be really thankful if anybody could tell me an efficient way to achieve this.

Important

DO NOT RELY UPON THE QuestionText column as it will be changes When somebody decided to change the questions

Edit

Source tables

Sql fiddle link with all the below tables

As per the suggestions in the answers, I am posting the source table and the queries in hope that there will be a better understanding of the problem

Questions table

+------------+--------------+---------+----------+---------------+
| QuestionID | QuestionText |  type   | SurveyID | IsIncremental |
+------------+--------------+---------+----------+---------------+
|       3483 | subform      | subform |      311 |             1 |
|       3484 | MEMBER       | text    |      311 |             0 |
|       3485 | AGE          | number  |      311 |             0 |
|       3486 | ADDRESS      | address |      311 |             0 |
+------------+--------------+---------+----------+---------------+

Results table

+----------+-------------------------+----------+
| ResultID |      DateSubmitted      | SurveyID |
+----------+-------------------------+----------+
|     2272 | 2017-04-12 05:11:41.477 |      311 |
|     2273 | 2017-04-12 05:12:22.227 |      311 |
|     2274 | 2017-04-12 05:13:02.227 |      311 |
+----------+-------------------------+----------+

Chunks table, where all the answers are stored:

+---------+------------+----------+------------+------------------+
| ChunkID |   Answer   | ResultID | QuestionID | SubFormIteration |
+---------+------------+----------+------------+------------------+
|    9606 | HOUSE 1    |     2272 |       3486 | NULL             |
|    9607 | MEMEBER 1  |     2272 |       3484 | NULL             |
|    9608 | 12         |     2272 |       3485 | NULL             |
|    9609 | MEMBER 2   |     2272 |       3484 | s1               |
|    9610 | 10         |     2272 |       3485 | s1               |
|    9611 | MEMEBER 1  |     2272 |       3484 | s0               |
|    9612 | 12         |     2272 |       3485 | s0               |
|    9613 | MEMBER 2   |     2272 |       3484 | s1               |
|    9614 | 10         |     2272 |       3485 | s1               |
|    9615 | HOUSE 2    |     2273 |       3486 | NULL             |
|    9616 | MEMBER A   |     2273 |       3484 | NULL             |
|    9617 | 23         |     2273 |       3485 | NULL             |
|    9618 | MEMBER B   |     2273 |       3484 | s1               |
|    9619 | 25         |     2273 |       3485 | s1               |
|    9620 | MEMBER A   |     2273 |       3484 | s0               |
|    9621 | 23         |     2273 |       3485 | s0               |
|    9622 | MEMBER B   |     2273 |       3484 | s1               |
|    9623 | 25         |     2273 |       3485 | s1               |
|    9624 | HOUSE 3    |     2274 |       3486 | NULL             |
|    9625 | MEMBER K   |     2274 |       3484 | NULL             |
|    9626 | 41         |     2274 |       3485 | NULL             |
|    9627 | MEMBER J   |     2274 |       3484 | s1               |
|    9628 | 26         |     2274 |       3485 | s1               |
|    9629 | MEMBER K   |     2274 |       3484 | s0               |
|    9630 | 41         |     2274 |       3485 | s0               |
|    9631 | MEMBER J   |     2274 |       3484 | s1               |
|    9632 | 26         |     2274 |       3485 | s1               |
+---------+------------+----------+------------+------------------+

I've written the following stored procedure which yields the first ever table given in this question:

ALTER PROCEDURE [dbo].[ResultForSurvey] @SurveyID int
AS
DECLARE @cols AS NVARCHAR(MAX),
    @query  AS NVARCHAR(MAX),@colsAggregated as nvarchar(max);
IF OBJECT_ID('tempdb.dbo.#Temp', 'U') IS NOT NULL
  DROP TABLE #Temp;
SELECT *
INTO #Temp
FROM (Select Answer=( case 
                    When Questions.type='checkboxes' or Questions.IsIncremental=1 THEN  STUFF((SELECT distinct ',' + c.Answer 
            FROM Chunks c Where c.ResultID=Results.ResultID and c.QuestionID=Questions.QuestionID and (Chunks.SubFormIteration IS NULL )
            FOR XML PATH(''), TYPE
            ).value('.', 'NVARCHAR(MAX)') 
        ,1,1,'')
         else  Chunks.Answer end),Chunks.SubFormIteration,Questions.QuestionText,Questions.type,Questions.QuestionID,Chunks.ResultID,Results.ResultID as Action,Results.DateSubmitted,Results.Username,Results.SurveyID from Chunks Join Questions on Questions.QuestionID= Chunks.QuestionID Join Results on Results.ResultID=Chunks.ResultID Where Results.SurveyID=@SurveyID) as X


SET @colsAggregated = STUFF((SELECT distinct ','+ 'max('+ QUOTENAME(c.QuestionText)+') as '+ QUOTENAME(c.QuestionText)
            FROM #Temp c
            FOR XML PATH(''), TYPE
            ).value('.', 'NVARCHAR(MAX)')
        ,1,1,'')
        print @colsAggregated
SET @cols = STUFF((SELECT distinct ',' + QUOTENAME(c.QuestionText) 
            FROM #Temp c
            FOR XML PATH(''), TYPE
            ).value('.', 'NVARCHAR(MAX)') 
        ,1,1,'')
set @query = 'SELECT ResultID,max(Username) as Username,max(DateSubmitted) as DateSubmitted,max(SubFormIteration) as SubFormIteration, ' + @colsAggregated + ' from 
            (
                select * 
                from #Temp
           ) as y
            pivot 
            (
                 max(Answer)
                for QuestionText in (' + @cols + ')
            ) as p GROUP BY 
    ResultID,SubFormIteration'
execute(@query)
like image 746
insomniac Avatar asked Apr 11 '17 11:04

insomniac


People also ask

How do I PIVOT results in SQL?

You follow these steps to make a query a pivot table: First, select a base dataset for pivoting. Second, create a temporary result by using a derived table or common table expression (CTE) Third, apply the PIVOT operator.

Is there a transpose function in SQL?

Using a T-SQL Pivot function is one of the simplest method for transposing rows into columns.


1 Answers

It may be beneficial to post the query that got you your original results; there is a possibility that the original query could be rewritten to avoid this complexity. With the given information, this is the most simplistic way of solving this problem:

SELECT
   h1.Id,
   h2.Address,
   h1.Member,
   h1.Age,
   h1.MemberNo
FROM House h1
  INNER JOIN House h2
  ON h1.Id = h2.Id
WHERE h2.Address IS NOT NULL  -- Eliminates the results whre the Address is NULL after the join
  AND h1.Member IS NOT NULL -- Eliminates the results that would show up from the original table (t1) where there is no Member field

Update:

Here is a simple example of the table structure using temp tables:

DROP TABLE #Questions
DROP TABLE #Results
DROP TABLE #Chunks

CREATE TABLE #Questions
(
  QuestionId INT,
  QuestionText VARCHAR(MAX),
  type VARCHAR(MAX),
  SurveyID INT,
  IsIncremental INT
)

CREATE TABLE #Results
(
    ResultId INT,
    DateSubmitted DATETIME,
    SurveyID INT
)

CREATE TABLE #Chunks
(
    ChunkId INT,
    Answer VARCHAR(MAX),
    ResultId INT,
    QuestionId INT,
    SubFormIteration VARCHAR(20)
)

INSERT INTO #Results
VALUES (2272, '04-12-2017', 311),
(2273, '04-12-2017', 311),
(2274, '04-12-2017', 311)

INSERT INTO #Chunks
VALUES (9606, 'WhiteHouse', 2272, 3486, NULL),
(9607, 'MEMBER 1', 2272, 3484, NULL),
(9608, '12', 2272, 3485, NULL),
(9609, 'MEMBER 2', 2272, 3484, 's1'),
(9610, '10', 2272, 3485, 's1'),
(9611, 'MEMBER 1', 2272, 3484, 's0'),
(9612, '12', 2272, 3485, 's0'),
(9613, 'MEMBER 2', 2272, 3484, 's1'),
(9614, '10', 2272, 3485, 's1'),
(9615, 'RpBhavan', 2273, 3486, NULL),
(9618, 'MEMBER B', 2273, 3484, 's1'),
(9619, '25', 2273, 3485, 's1'),
(9620, 'MEMBER A', 2273, 3484, 's0'),
(9621, '23', 2273, 3485, 's0')

INSERT INTO #Questions
VALUES (3483, 'subform', 'subform', 311, 1),
( 3484, 'MEMBER', 'text', 311, 0 ),
(3485, 'AGE', 'number', 311, 0),
(3486, 'ADDRESS', 'address', 311, 0)

Here is a way to produce the results your looking for without the use of PIVOTs and XML:

; WITH Responses AS (
SELECT
    c.ResultId,
    QuestionText,
    Answer,
    c.SubFormIteration
FROM #Chunks c
    INNER JOIN #Results r
    ON c.ResultId = r.ResultId
    INNER JOIN #Questions q
    ON q.QuestionId = c.QuestionId
WHERE c.SubFormIteration IS NOT NULL -- Removes the "Address" responses and duplicate Answers
),
FindAddress AS (
-- Pulls ONLY the address for each ResultId
SELECT
    ResultId,
    MAX(CASE WHEN QuestionText = 'ADDRESS' THEN Answer END) AS [Address]
FROM #Chunks c
    INNER JOIN #Questions q
    ON q.QuestionId = c.QuestionId
GROUP BY ResultId
)
-- Combines all responses and the address back together
SELECT
    r.ResultId,
    fa.Address,
    MAX(CASE WHEN QuestionText = 'MEMBER' THEN Answer END) AS [MEMBER],
    MAX(CASE WHEN QuestionText = 'AGE' THEN Answer END) AS [Age],
    SubFormIteration
FROM Responses r
    INNER JOIN FindAddress fa
    ON fa.ResultId = r.ResultId
GROUP BY r.ResultId, SubFormIteration, fa.Address

Essentially, I broke a rather large query into a Common Table Expression (CTE). Each query had a purpose: a) Response pulls all responses except the address, b) Pulls only the address based on ResultId, and c) Combine both queries together.

The MAX(CASE...) followed by GROUP BY is an alternative method to using PIVOTS and they essentially perform the same.

To apply this query to your specific case, you should only need to change the name of the tables.

like image 58
EnterTheCode Avatar answered Oct 17 '22 06:10

EnterTheCode