Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the last dates from multiple columns

This feels like it should be an easy one. How do I get the latest dates that are in different columns

DROP TABLE #indebtedness
CREATE TABLE #indebtedness (call_case CHAR(10), date1 DATETIME, date2 DATETIME, date3 DATETIME)
INSERT #indebtedness VALUES ('Key1', '2019-10-30', '2019-11-30', '2019-10-25')
INSERT #indebtedness VALUES ('Key2', '2019-10-20', '2019-10-30', '2019-10-15')
INSERT #indebtedness VALUES ('Key3', '2019-11-11', '2019-10-29', '2019-10-30')
INSERT #indebtedness VALUES ('Key4',     null    , '2019-10-29', '2019-10-13')

select call_case, ?? AS 'Latest Date' from #indebtedness 

I would like the result to be:

call_case   Latest Date
Key1        2019-11-30 
Key2        2019-10-30 
Key3        2019-11-11 
Key4        2019-10-29 
like image 492
Ahmed Alkhteeb Avatar asked Nov 06 '19 07:11

Ahmed Alkhteeb


6 Answers

Use a CASE expression:

SELECT
    call_case,
    CASE WHEN date1 > date2 AND date1 > date3
         THEN date1
         WHEN date2 > date3
         THEN date2
         ELSE date3 END AS [Latest Date]
FROM #indebtedness;

Demo

Note that some databases, such as MySQL, SQL Server, and SQLite, support a scalar greatest function. SQL Server does not, so we can use a CASE expression as a workaround.

Edit:

It appears that in your actual table, one or more of the three date columns could have NULL values. We can adapt the above query as follows:

SELECT
    call_case,
    CASE WHEN (date1 > date2 OR date2 IS NULL) AND (date1 > date3 OR date3 IS NULL)
         THEN date1
         WHEN date2 > date3 OR date3 IS NULL
         THEN date2
         ELSE date3 END AS [Latest Date]
FROM #indebtedness;

Demo

like image 102
Tim Biegeleisen Avatar answered Oct 22 '22 07:10

Tim Biegeleisen


The currently accepted answer is the best answer, but I don't think it does a good enough job of explaining why. The other answers certainly look much cleaner at a glance (who wants to write that ugly case statement), but are likely to be much worse when you start operating at scale.

SELECT @@VERSION

Microsoft SQL Server 2016 (SP2) (KB4052908) - 13.0.5026.0 (X64) 
Mar 18 2018 09:11:49 
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows 10 Enterprise 10.0 <X64> (Build 17763: )

Here is how I set everything up

DECLARE @Offset bigint = 0;
DECLARE @Max bigint = 10000000;

DROP TABLE IF EXISTS #Indebtedness;
CREATE TABLE #Indebtedness
(
  call_case char(10) COLLATE DATABASE_DEFAULT NOT NULL,
  date1     datetime NULL,
  date2     datetime NULL,
  date3     datetime NULL
);

WHILE @Offset < @Max
BEGIN

  INSERT INTO #Indebtedness
  ( call_case, date1, date2, date3 )
    SELECT @Offset + ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )),
           DATEADD( DAY,
                    CASE WHEN RAND() > 0 THEN 1
                         ELSE -1 END * ROUND( RAND(), 0 ),
                    CURRENT_TIMESTAMP ),
           DATEADD( DAY,
                    CASE WHEN RAND() > 0 THEN 1
                         ELSE -1 END * ROUND( RAND(), 0 ),
                    CURRENT_TIMESTAMP ),
           DATEADD( DAY,
                    CASE WHEN RAND() > 0 THEN 1
                         ELSE -1 END * ROUND( RAND(), 0 ),
                    CURRENT_TIMESTAMP )
      FROM master.dbo.spt_values a
        CROSS APPLY master.dbo.spt_values b;


  SET @Offset = @Offset + ROWCOUNT_BIG();
END;

On my system this gets me 12,872,738 rows in the table. If I try each of the above queries (tweaked to SELECT INTO so I don't need to wait for it to finish printing the results in SSMS), I get the following results:

Method                                | CPU time (ms) | Elapsed time (ms) | Relative Cost
-----------------------------------------------------------------------------------------
Tim Biegeleisen (CASE)                | 13485         | 2167              | 2%
Red Devil (Subquery over MAX columns) | 55187         | 9891              | 14%
Vignesh Kumar (Subquery over columns) | 33750         | 5139              | 5%
Serkan Arslan (UNPIVOT)               | 86205         | 15023             | 12%
Metal (STRING_SPLIT)                  | 459668        | 186742            | 68%

If you look at the query plans, it becomes pretty obvious why - adding any kind of unpivot or aggregate (or heaven forbid STRING_SPLIT) you'll end up with all sorts of additional operators that you don't need (and it forces the plan to go parallel, taking away resources other queries might want). By contract, the CASE based solution does not go parallel, runs very quickly, and is incredibly simple.

In this case, unless you have unlimited resources (you don't), you should pick the simplest and fastest approach.


There was a question of what to do if you need to keep adding new columns and expanding the case statement. Yes, this gets unwieldy, but so does every other solution. If this is actually a plausible workflow, then you should redesign your table. What you want probably looks something like this:

CREATE TABLE #Indebtedness2
(
  call_case     char(10) COLLATE DATABASE_DEFAULT NOT NULL,
  activity_type bigint   NOT NULL,  -- This indicates which date# column it was, if you care
  timestamp     datetime NOT NULL
);

SELECT Indebtedness.call_case,
       Indebtedness.activity_type,
       Indebtedness.timestamp
  FROM ( SELECT call_case,
                activity_type,
                timestamp,
                ROW_NUMBER() OVER ( PARTITION BY call_case
                                    ORDER BY timestamp DESC ) RowNumber
           FROM #Indebtedness2 ) Indebtedness
  WHERE Indebtedness.RowNumber = 1;

This is certainly not free of potential performance issues, and will require careful index tuning, but is the best way to handle an arbitrary number of potential timestamps


In case any answers get deleted, here are the versions I was comparing (in order)

SELECT
    call_case,
    CASE WHEN date1 > date2 AND date1 > date3
         THEN date1
         WHEN date2 > date3
         THEN date2
         ELSE date3 END AS [Latest Date]
FROM #indebtedness;

SELECT call_case,
  (SELECT Max(v) 
   FROM (VALUES (date1), (date2), (date3),...) AS value(v)) as [MostRecentDate]
FROM #indebtedness

SELECT call_case,
  (SELECT
     MAX(call_case) 
   FROM ( VALUES 
            (MAX(date1)), 
            (MAX(date2))
            ,(max(date3)) 
        ) MyAlias(call_case)
  ) 
FROM #indebtedness
group by call_case

select call_case, MAX(date)  [Latest Date] from #indebtedness 
UNPIVOT(date FOR col IN ([date1], [date2], [date3])) UNPVT
GROUP BY call_case

select call_case , max(cast(x.Item as date)) as 'Latest Date' from #indebtedness  t
cross apply dbo.SplitString(concat(date1, ',', date2, ',', date3), ',') x
group by call_case
like image 33
Dan Oberlam Avatar answered Oct 22 '22 06:10

Dan Oberlam


Try this:

SELECT call_case,
  (SELECT
     MAX(call_case) 
   FROM ( VALUES 
            (MAX(date1)), 
            (MAX(date2))
            ,(max(date3)) 
        ) MyAlias(call_case)
  ) 
FROM #indebtedness
group by call_case
like image 11
Red Devil Avatar answered Oct 22 '22 05:10

Red Devil


SQL FIDDLE

Use MAX()

SELECT call_case,
  (SELECT Max(v) 
   FROM (VALUES (date1), (date2), (date3),...) AS value(v)) as [MostRecentDate]
FROM #indebtedness

Use CASE

 SELECT
        CASE
            WHEN Date1 >= Date2 AND Date1 >= Date3 THEN Date1
            WHEN Date2 >= Date1 AND Date2 >= Date3 THEN Date2
            WHEN Date3 >= Date1 AND Date3 >= Date2 THEN Date3
            ELSE                                        Date1
        END AS MostRecentDate
 FROM  #indebtedness
like image 8
Vignesh Kumar A Avatar answered Oct 22 '22 05:10

Vignesh Kumar A


In my view, Pivot is the best and efficient option for this query. Copy and Paste in the MS SQL SERVER. Please check the code written below:

CREATE TABLE #indebtedness (call_case CHAR(10), date1 DATETIME, date2 DATETIME, date3 DATETIME)
INSERT #indebtedness VALUES ('Key1', '2019-10-30', '2019-11-30', '2019-10-31')
INSERT #indebtedness VALUES ('Key2', '2019-10-20', '2019-10-30', '2019-11-21')
INSERT #indebtedness VALUES ('Key3', '2019-11-11', '2019-10-29', '2019-10-30')
INSERT #indebtedness VALUES ('Key4', Null, '2019-10-29', '2019-10-13')

--Solution-1:
SELECT        
    call_case,
    MAX(RecnetDate) as MaxDateColumn         
FROM #indebtedness
UNPIVOT
(RecnetDate FOR COL IN ([date1], [date2], [date3])) as TRANSPOSE
GROUP BY call_case 

--Solution-2:
select 
    call_case, case 
    when date1>date2 and date1 > date3 then date1
    when date2>date3                   then date2
    when date3>date1                   then date1 
   else date3 end as date
from #indebtedness as a 


Drop table #indebtedness
like image 1
Satheesh Avatar answered Oct 22 '22 06:10

Satheesh


This really should be re-evaluated at the design level as others have indicated. Below is an example of a different design using two tables to better accomplish what it appears you are looking for in your results. This will make growth much more favorable.

Here is an example (different table names used):

-- Drop pre-existing tables
DROP TABLE #call_log
DROP TABLE #case_type

-- Create table for Case Types
CREATE TABLE #case_type (id INT PRIMARY KEY CLUSTERED NOT NULL, 
    descript VARCHAR(50) NOT NULL)
INSERT #case_type VALUES (1,'No Answer')
INSERT #case_type VALUES (2,'Answer')
INSERT #case_type VALUES (3,'Not Exist')
INSERT #case_type VALUES (4,'whatsapp')
INSERT #case_type VALUES (5,'autodial')
INSERT #case_type VALUES (6,'SMS')

-- Create a Call Log table with a primary identity key and also an index on the call types
CREATE TABLE #call_log (call_num BIGINT PRIMARY KEY CLUSTERED IDENTITY NOT NULL,
    call_type INT NOT NULL REFERENCES #case_type(id), call_date DATETIME)
CREATE NONCLUSTERED INDEX ix_call_log_entry_type ON #call_log(call_type)
INSERT #call_log(call_type, call_date) VALUES (1,'2019-11-30')
INSERT #call_log(call_type, call_date) VALUES (2,'2019-10-15')
INSERT #call_log(call_type, call_date) VALUES (3,null)
INSERT #call_log(call_type, call_date) VALUES (3,'2019-10-29')
INSERT #call_log(call_type, call_date) VALUES (1,'2019-10-25')
INSERT #call_log(call_type, call_date) VALUES (2,'2019-10-30')
INSERT #call_log(call_type, call_date) VALUES (3,'2019-10-13')
INSERT #call_log(call_type, call_date) VALUES (2,'2019-10-20')
INSERT #call_log(call_type, call_date) VALUES (1,'2019-10-30')

-- use an aggregate to show only the latest date for each case type
SELECT DISTINCT ct.descript, MAX(cl.call_date) AS "Date" 
    FROM #call_log cl JOIN #case_type ct ON cl.call_type = ct.id GROUP BY ct.descript

This allows for more case types to be added, many more log entries to be added and provides a better design.

This is just an example for learning purposes.

like image 1
Enoch Avatar answered Oct 22 '22 05:10

Enoch