Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge multiple rows with same ID into one row

How can I merge multiple rows with same ID into one row.

When value in first and second row in the same column is the same or when there is value in first row and NULL in second row. I don't want to merge when value in first and second row in the same column is different.

I have table:

ID |A    |B    |C
1   NULL  31    NULL
1   412   NULL  1
2   567   38    4
2   567   NULL  NULL
3   2     NULL  NULL
3   5     NULL  NULL
4   6     1     NULL
4   8     NULL  5
4   NULL  NULL  5

I want to get table:

ID |A    |B    |C
1   412   31    1
2   567   38    4
3   2     NULL  NULL
3   5     NULL  NULL
4   6     1     NULL
4   8     NULL  5
4   NULL  NULL  5
like image 803
Hemus San Avatar asked Feb 17 '15 19:02

Hemus San


2 Answers

I think there's a simpler solution to the above answers (which is also correct). It basically gets the merged values that can be merged within a CTE, then merges that with the data not able to be merged.

WITH CTE AS (
    SELECT
        ID,
        MAX(A) AS A,
        MAX(B) AS B,
        MAX(C) AS C
    FROM dbo.Records
    GROUP BY ID
    HAVING MAX(A) = MIN(A)
        AND MAX(B) = MIN(B)
        AND MAX(C) = MIN(C)
)
    SELECT *
    FROM CTE
    UNION ALL
    SELECT *
    FROM dbo.Records
    WHERE ID NOT IN (SELECT ID FROM CTE)

SQL Fiddle: http://www.sqlfiddle.com/#!6/29407/1/0

like image 198
Jason W Avatar answered Sep 19 '22 22:09

Jason W


WITH Collapsed AS (
   SELECT
      ID,
      A = Min(A),
      B = Min(B),
      C = Min(C)
   FROM
      dbo.MyTable
   GROUP BY
      ID
   HAVING
      EXISTS (
         SELECT Min(A), Min(B), Min(C)
         INTERSECT
         SELECT Max(A), Max(B), Max(C)
      )
)
SELECT
   *
FROM
   Collapsed
UNION ALL
SELECT
   *
FROM
   dbo.MyTable T
WHERE
   NOT EXISTS (
      SELECT *
      FROM Collapsed C
      WHERE T.ID = C.ID
);

See this working in a SQL Fiddle

This works by creating all the mergeable rows through the use of Min and Max--which should be the same for each column within an ID and which usefully exclude NULLs--then appending to this list all the rows from the table that couldn't be merged. The special trick with EXISTS ... INTERSECT allows for the case when a column has all NULL values for an ID (and thus the Min and Max are NULL and can't equal each other). That is, it functions like Min(A) = Max(A) AND Min(B) = Max(B) AND Min(C) = Max(C) but allows for NULLs to compare as equal.

Here's a slightly different (earlier) solution I gave that may offer different performance characteristics, and being more complicated, I like less, but being a single flowing query (without a UNION) I kind of like more, too.

WITH Collapsible AS (
   SELECT
      ID
   FROM
      dbo.MyTable
   GROUP BY
      ID
   HAVING
      EXISTS (
         SELECT Min(A), Min(B), Min(C)
         INTERSECT
         SELECT Max(A), Max(B), Max(C)
      )
), Calc AS (
   SELECT
      T.*,
      Grp = Coalesce(C.ID, Row_Number() OVER (PARTITION BY T.ID ORDER BY (SELECT 1)))
   FROM
      dbo.MyTable T
      LEFT JOIN Collapsible C
         ON T.ID = C.ID
)
SELECT
   ID,
   A = Min(A),
   B = Min(B),
   C = Min(C)
FROM
   Calc
GROUP BY
   ID,
   Grp
;

This is also in the above SQL Fiddle.

This uses similar logic as the first query to calculate whether a group should be merged, then uses this to create a grouping key that is either the same for all rows within an ID or is different for all rows within an ID. With a final Min (Max would have worked just as well) the rows that should be merged are merged because they share a grouping key, and the rows that shouldn't be merged are not because they have distinct grouping keys over the ID.

Depending on your data set, indexes, table size, and other performance factors, either of these queries may perform better, though the second query has some work to do to catch up, with two sorts instead of one.

like image 34
ErikE Avatar answered Sep 18 '22 22:09

ErikE