Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine First and Last Names of people based on criteria

So I have an interesting scenario. There are 2 events, for which one of the Married EventType I'd like to join associated married couples of the house, grouped by an identifier (HouseID; 2 people in the same house), the EventType, and the EventDate. For the case such as an EventType of Birthday, the 2 residents of the house would not be combined in the same row. In the case of an EventType of Wedding, combine the name result of 2 rows (2 rules below based on LastName) into 1. It's possible to have only 1 person for a HouseID for the EventType of Wedding or Birthday, so therefore they would list as an individual row. The combining rules for EventType of Wedding would be as follows:

  1. If the last names are the same, the FinalName column result would be Will and Mary Stanton
  2. If the last names are different, the FinalName column result would be Stephen Jacobs and Janetsy Lilly.

The combining of the names would be contingent on the HouseID, EventType, and EventDate being the same, specific only to Wedding. This is because it's possible for 2 people to live in a house that are married, but not to each other which we base off the EventDate; we assume the EventDate is the indicator that they are married to each other. The example table input is as follows:

DECLARE @t TABLE (
    HouseID INT,
    FirstName NVARCHAR(64),
    LastName NVARCHAR(64),
    EventType NVARCHAR(64),
    EventDate DATE
);

INSERT INTO @t (HouseID, FirstName, LastName, EventType, EventDate)
VALUES
    (1, 'Will', 'Stanton', 'Birthday', '1974-01-05'),
    (1, 'Mary', 'Stanton', 'Birthday', '1980-05-22'),
    (2, 'Jason', 'Stockmore', 'Birthday', '1987-12-07'),
    (3, 'Mark', 'Mellony', 'Wedding', '2021-04-04'),
    (3, 'Stacy', 'Mellony', 'Wedding', '2021-04-04'),
    (4, 'Stephen', 'Johnson', 'Wedding', '2012-01-30'),
    (4, 'Janetsy', 'Johnson', 'Wedding', '2012-01-30'),
    (5, 'George', 'Jackson', 'Wedding', '2009-11-15'),
    (5, 'Sally', 'Mistmoore', 'Wedding', '2009-11-15'),
    (6, 'Sandy', 'Katz', 'Wedding', '2010-03-19'),
    (6, 'Jeff', 'Trilov', 'Wedding', '2016-09-09'),
    (7, 'Sandra', 'Kirchbaum', 'Wedding', '2011-05-22'),
    (8, 'Jessica', 'Bower', 'Birthday', '1996-02-26'),
    (8, 'Frank', 'Fjorn', 'Birthday', '1969-07-19');

The ideal result based on the input table would resemble:

| HouseID | FinalName                          | EventType | EventDate  |
| ------- | ---------------------------------- | --------- | ---------- |
| 1       | Mary Stanton                       | Birthday  | 1974-01-05 |
| 1       | Will Stanton                       | Birthday  | 1980-05-22 |
| 2       | Jason Stockmore                    | Birthday  | 1987-12-07 |
| 3       | Mark and Stacy Mellony             | Wedding   | 2021-04-04 |
| 4       | Stephen and Janetsy Johnson        | Wedding   | 2012-01-30 |
| 5       | George Jackson and Sally Mistmoore | Wedding   | 2009-11-15 |
| 6       | Sandy Katz                         | Wedding   | 2010-03-19 |
| 6       | Jeff Trilov                        | Wedding   | 2016-09-09 |
| 7       | Sandra Kirchbaum                   | Wedding   | 2011-05-22 |
| 8       | Jessica Bower                      | Birthday  | 1996-02-26 |
| 8       | Frank Fjorn                        | Birthday  | 1969-07-19 |

I have tried a couple of approaches; one of which is using an update statement to update the First Name and update a subsequent FirstName based on Row number using a previously set @Values variable, unioning the result of the types that do not combine (in this case, Birthday). Here is where I built the names then used the MAX() aggregation to select the larger result:

SELECT HouseID,
       FirstName
     , LastName
     , EventType
     , EventDate
     , RowNum = ROW_NUMBER() OVER (PARTITION BY LastName, EventType ORDER BY 1/0)
     , Values1 = CAST(NULL AS VARCHAR(MAX))
INTO #EntityValues1
FROM @t
WHERE EventType = 'Wedding'

UPDATE #EntityValues1
SET @Values1 = Values1 =
        CASE WHEN RowNum = 1 
            THEN FirstName
            ELSE @Values1 + ' and ' + FirstName 
        END

However this example only works with combining FirstName1 + FirstName2 + LastName (in a subsequent query with MAX(Values1) + ' ' + LastName. I had to do a subsequent query to take the approach where I am combining names that do not have the same last name. I know this particular query is a bit tricky, but I'm wondering if there's any magic I'm missing out on. I've seen some suggestions that use a FOR XML approach with STUFF involved, and some other suggestions, but this one appears to be a tough one.

like image 279
Novatone Avatar asked Oct 21 '25 03:10

Novatone


1 Answers

Here is an answer, with a working demo

;WITH
[DoubleWeddings] AS (
SELECT
            [HouseID]
    FROM
            @t
    WHERE
            [EventType] = 'Wedding'
    GROUP BY
            [HouseID],
            [EventDate]
        HAVING
            COUNT(*) = 2
),
[DoubleWeddingsSameLastName] AS (
SELECT
            T.[HouseID]
    FROM
            [DoubleWeddings] DW
        JOIN
            @t T
                ON T.[HouseID] = DW.[HouseID]
    GROUP BY
            T.[HouseID],
            T.[LastName]
        HAVING
            COUNT(*) = 2
),
[DoubleWeddingsDifferentLastName] AS (
SELECT [HouseID] FROM [DoubleWeddings]
EXCEPT
SELECT [HouseID] FROM [DoubleWeddingsSameLastName]
),
[Couples] AS ( 
SELECT
           T.[HouseID],
           ROW_NUMBER() OVER (PARTITION BY T.[HouseID] ORDER BY 1/0) [RN],
           T.[FirstName],
           T.[LastName],
           T.[EventType],
           T.[EventDate]
    FROM
           [DoubleWeddings] DW
        JOIN
           @t T
               ON T.[HouseID] = DW.[HouseID]
)
SELECT
            DWSL.[HouseID],
            FORMATMESSAGE(
                '%s and %s %s',
                F.[FirstName],
                S.[FirstName],
                S.[LastName]) [FinalName],
            F.[EventType],
            F.[EventDate]
    FROM
            [DoubleWeddingsSameLastName] DWSL
        JOIN
            [Couples] F
                ON F.[HouseID] = DWSL.[HouseID] AND F.[RN] = 1
        JOIN
            [Couples] S
                ON S.[HouseID] = DWSL.[HouseID] AND S.[RN] = 2
UNION ALL
SELECT
            DWDL.[HouseID],
            FORMATMESSAGE(
                '%s %s and %s %s',
                F.[FirstName],
                F.[LastName],
                S.[FirstName],
                S.[LastName]) [FinalName],
            F.[EventType],
            F.[EventDate]
    FROM
            [DoubleWeddingsDifferentLastName] DWDL
        JOIN
            [Couples] F
                ON F.[HouseID] = DWDL.[HouseID] AND F.[RN] = 1
        JOIN
            [Couples] S
                ON S.[HouseID] = DWDL.[HouseID] AND S.[RN] = 2
UNION ALL
SELECT
            T.[HouseID],
            FORMATMESSAGE(
                '%s %s',
                T.[FirstName],
                T.[LastName]) [FinalName],
            T.[EventType],
            T.[EventDate]
    FROM
            @t T
        LEFT JOIN
            [DoubleWeddings] DW
                ON DW.[HouseID] = T.[HouseID]
    WHERE
            DW.[HouseID] IS NULL
    ORDER BY
            [HouseID],
            [EventDate],
            [FinalName]

There is one issue, I've used the ORDER BY 1/0 technique to skip providing an order for the ROW_NUMBER() which tends to assign the row numbers in the order of the underlying data. However, this is not guaranteed, and could very depending on the parallelization of the query.

It would be better if the order of the combination was provided by a column in the data, however, none is present in the example.

like image 57
Jodrell Avatar answered Oct 23 '25 17:10

Jodrell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!