Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows with no date range overlap

Imagine the following Loans table:

BorrowerID       StartDate         DueDate
=============================================
1                2012-09-02        2012-10-01
2                2012-10-05        2012-10-21
3                2012-11-07        2012-11-09
4                2012-12-01        2013-01-01
4                2012-12-01        2013-01-14
1                2012-12-20        2013-01-06
3                2013-01-07        2013-01-22
3                2013-01-15        2013-01-18
1                2013-02-20        2013-02-24

How would I go about selecting the distinct BorrowerIDs of those who have only ever taken out a single loan at a time? This includes borrowers who have only ever taken out a single loan, as well as those who have taken out more than one, provided if you were to draw a time line of their loans, none of them would overlap. For example, in the table above, it should find borrowers 1 and 2 only.

I've tried experimenting with joining the table to itself, but haven't really managed to get anywhere. Any pointers much appreciated!

like image 726
George Brighton Avatar asked Oct 13 '13 22:10

George Brighton


People also ask

How do you calculate overlap time?

data RESULTS; set EVENTS; OVERLAP = min(A2,B2) - max(A1,B1) + 1; if OVERLAP<0 then OVERLAP = 0; run; We can also zero out the negative values of variable x using max(0, x) expression that results in the following formula for the date ranges overlap calculation: Overlap = max(0, min(A2, B2) - max(A1, B1) + 1).

What is the meaning of overlapping dates?

Overlapping date a day and month in any year during the deposit period, whose number is the same as the number of the day and month on which the deposit commencement date falls.


2 Answers

Solution for dbo.Loan with PRIMARY KEY

To solve this you need a two step approach as detailed in the following SQL Fiddle. I did add a LoanId column to your example data and the query requires that such a unique id exists. If you don't have that, you need to adjust the join clause to make sure that a loan does not get matched to itself.

MS SQL Server 2008 Schema Setup:

CREATE TABLE dbo.Loans
    (LoanID INT, [BorrowerID] int, [StartDate] datetime, [DueDate] datetime)
GO

INSERT INTO dbo.Loans
    (LoanID, [BorrowerID], [StartDate], [DueDate])
VALUES
    (1, 1, '2012-09-02 00:00:00', '2012-10-01 00:00:00'),
    (2, 2, '2012-10-05 00:00:00', '2012-10-21 00:00:00'),
    (3, 3, '2012-11-07 00:00:00', '2012-11-09 00:00:00'),
    (4, 4, '2012-12-01 00:00:00', '2013-01-01 00:00:00'),
    (5, 4, '2012-12-01 00:00:00', '2013-01-14 00:00:00'),
    (6, 1, '2012-12-20 00:00:00', '2013-01-06 00:00:00'),
    (7, 3, '2013-01-07 00:00:00', '2013-01-22 00:00:00'),
    (8, 3, '2013-01-15 00:00:00', '2013-01-18 00:00:00'),
    (9, 1, '2013-02-20 00:00:00', '2013-02-24 00:00:00')
GO

First you need to find out which loans overlap with another loan. The query uses <= to compare the start and due dates. That counts loans where the second one starts the same day the first one ends as overlapping. If you need those to not be overlapping use < instead in both places.

Query 1:

SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L2.LoanID <> L1.LoanID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate) 
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM dbo.Loans L1;

Results:

| LOANID | BORROWERID |                        STARTDATE |                         DUEDATE | HASOVERLAPPINGLOAN |
|--------|------------|----------------------------------|---------------------------------|--------------------|
|      1 |          1 | September, 02 2012 00:00:00+0000 |  October, 01 2012 00:00:00+0000 |                  0 |
|      2 |          2 |   October, 05 2012 00:00:00+0000 |  October, 21 2012 00:00:00+0000 |                  0 |
|      3 |          3 |  November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 |                  0 |
|      4 |          4 |  December, 01 2012 00:00:00+0000 |  January, 01 2013 00:00:00+0000 |                  1 |
|      5 |          4 |  December, 01 2012 00:00:00+0000 |  January, 14 2013 00:00:00+0000 |                  1 |
|      6 |          1 |  December, 20 2012 00:00:00+0000 |  January, 06 2013 00:00:00+0000 |                  0 |
|      7 |          3 |   January, 07 2013 00:00:00+0000 |  January, 22 2013 00:00:00+0000 |                  1 |
|      8 |          3 |   January, 15 2013 00:00:00+0000 |  January, 18 2013 00:00:00+0000 |                  1 |
|      9 |          1 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |                  0 |

Now, with that information you can determine the borrowers that have no overlapping loans with this query:

Query 2:

WITH OverlappingLoans AS (
  SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L2.LoanID <> L1.LoanID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate) 
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM dbo.Loans L1
),
OverlappingBorrower AS (
  SELECT BorrowerID, MAX(HasOverlappingLoan) HasOverlappingLoan
    FROM OverlappingLoans
   GROUP BY BorrowerID
)
SELECT * 
  FROM OverlappingBorrower
 WHERE hasOverlappingLoan = 0;

Or you could even get more information by counting the loans as well as counting the number of loans that have overlapping other loans for each borrower in the database. (Note, if loan A and loan B overlap, both will be counted as overlapping loan by this query)

Results:

| BORROWERID | HASOVERLAPPINGLOAN |
|------------|--------------------|
|          1 |                  0 |
|          2 |                  0 |

Query 3:

WITH OverlappingLoans AS (
  SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L2.LoanID <> L1.LoanID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate) 
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM dbo.Loans L1
)
SELECT BorrowerID,COUNT(1) LoanCount, SUM(hasOverlappingLoan) OverlappingCount
  FROM OverlappingLoans
 GROUP BY BorrowerID;

Results:

| BORROWERID | LOANCOUNT | OVERLAPPINGCOUNT |
|------------|-----------|------------------|
|          1 |         3 |                0 |
|          2 |         1 |                0 |
|          3 |         3 |                2 |
|          4 |         2 |                2 |


Solution for dbo.Loan without PRIMARY KEY

UPDATE: As the requirement actually calls for a solution that does not rely on a unique identifier for each loan, I made the following changes:

1) I added a borrower that has two loans with the same start and due dates

SQL Fiddle

MS SQL Server 2008 Schema Setup:

CREATE TABLE dbo.Loans
    ([BorrowerID] int, [StartDate] datetime, [DueDate] datetime)
GO

INSERT INTO dbo.Loans
    ([BorrowerID], [StartDate], [DueDate])
VALUES
    ( 1, '2012-09-02 00:00:00', '2012-10-01 00:00:00'),
    ( 2, '2012-10-05 00:00:00', '2012-10-21 00:00:00'),
    ( 3, '2012-11-07 00:00:00', '2012-11-09 00:00:00'),
    ( 4, '2012-12-01 00:00:00', '2013-01-01 00:00:00'),
    ( 4, '2012-12-01 00:00:00', '2013-01-14 00:00:00'),
    ( 1, '2012-12-20 00:00:00', '2013-01-06 00:00:00'),
    ( 3, '2013-01-07 00:00:00', '2013-01-22 00:00:00'),
    ( 3, '2013-01-15 00:00:00', '2013-01-18 00:00:00'),
    ( 1, '2013-02-20 00:00:00', '2013-02-24 00:00:00'),
    ( 5, '2013-02-20 00:00:00', '2013-02-24 00:00:00'),
    ( 5, '2013-02-20 00:00:00', '2013-02-24 00:00:00')
GO

2) Those "equal date" loans require an additional step:

Query 1:

SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
  FROM dbo.Loans
 GROUP BY BorrowerID, StartDate, DueDate;

Results:

| BORROWERID |                        STARTDATE |                         DUEDATE | LOANCOUNT |
|------------|----------------------------------|---------------------------------|-----------|
|          1 | September, 02 2012 00:00:00+0000 |  October, 01 2012 00:00:00+0000 |         1 |
|          1 |  December, 20 2012 00:00:00+0000 |  January, 06 2013 00:00:00+0000 |         1 |
|          1 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         1 |
|          2 |   October, 05 2012 00:00:00+0000 |  October, 21 2012 00:00:00+0000 |         1 |
|          3 |  November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 |         1 |
|          3 |   January, 07 2013 00:00:00+0000 |  January, 22 2013 00:00:00+0000 |         1 |
|          3 |   January, 15 2013 00:00:00+0000 |  January, 18 2013 00:00:00+0000 |         1 |
|          4 |  December, 01 2012 00:00:00+0000 |  January, 01 2013 00:00:00+0000 |         1 |
|          4 |  December, 01 2012 00:00:00+0000 |  January, 14 2013 00:00:00+0000 |         1 |
|          5 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         2 |

3) Now, with each loan range unique, we can use the old technique again. However, we also need to account for those "equal date" loans. (L1.StartDate <> L2.StartDate OR L1.DueDate <> L2.DueDate) prevents a loan getting matched with itself. OR LoanCount > 1 accounts for "equal date" loans.

Query 2:

WITH NormalizedLoans AS (
  SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
    FROM dbo.Loans
   GROUP BY BorrowerID, StartDate, DueDate  
)
SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate
                       AND (L1.StartDate <> L2.StartDate
                            OR L1.DueDate <> L2.DueDate)
                   ) 
             OR LoanCount > 1
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM NormalizedLoans L1;

Results:

| BORROWERID |                        STARTDATE |                         DUEDATE | LOANCOUNT | HASOVERLAPPINGLOAN |
|------------|----------------------------------|---------------------------------|-----------|--------------------|
|          1 | September, 02 2012 00:00:00+0000 |  October, 01 2012 00:00:00+0000 |         1 |                  0 |
|          1 |  December, 20 2012 00:00:00+0000 |  January, 06 2013 00:00:00+0000 |         1 |                  0 |
|          1 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         1 |                  0 |
|          2 |   October, 05 2012 00:00:00+0000 |  October, 21 2012 00:00:00+0000 |         1 |                  0 |
|          3 |  November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 |         1 |                  0 |
|          3 |   January, 07 2013 00:00:00+0000 |  January, 22 2013 00:00:00+0000 |         1 |                  1 |
|          3 |   January, 15 2013 00:00:00+0000 |  January, 18 2013 00:00:00+0000 |         1 |                  1 |
|          4 |  December, 01 2012 00:00:00+0000 |  January, 01 2013 00:00:00+0000 |         1 |                  1 |
|          4 |  December, 01 2012 00:00:00+0000 |  January, 14 2013 00:00:00+0000 |         1 |                  1 |
|          5 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         2 |                  1 |

This query logic did not change (other than switching out the beginning).

Query 3:

WITH NormalizedLoans AS (
  SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
    FROM dbo.Loans
   GROUP BY BorrowerID, StartDate, DueDate  
),
OverlappingLoans AS (
SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate
                       AND (L1.StartDate <> L2.StartDate
                            OR L1.DueDate <> L2.DueDate)
                   ) 
             OR LoanCount > 1
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM NormalizedLoans L1
),
OverlappingBorrower AS (
  SELECT BorrowerID, MAX(HasOverlappingLoan) HasOverlappingLoan
    FROM OverlappingLoans
   GROUP BY BorrowerID
)
SELECT * 
  FROM OverlappingBorrower
 WHERE hasOverlappingLoan = 0;

Results:

| BORROWERID | HASOVERLAPPINGLOAN |
|------------|--------------------|
|          1 |                  0 |
|          2 |                  0 |

4) In this counting query we need to incorporate the "equal date" loan counts again. For that we use SUM(LoanCount) instead of a plain COUNT. We also have to multiply hasOverlappingLoan with the LoanCount to get the correct overlapping count again.

Query 4:

WITH NormalizedLoans AS (
  SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
    FROM dbo.Loans
   GROUP BY BorrowerID, StartDate, DueDate  
),
OverlappingLoans AS (
SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate
                       AND (L1.StartDate <> L2.StartDate
                            OR L1.DueDate <> L2.DueDate)
                   ) 
             OR LoanCount > 1
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM NormalizedLoans L1
)
SELECT BorrowerID,SUM(LoanCount) LoanCount, SUM(hasOverlappingLoan*LoanCount) OverlappingCount
  FROM OverlappingLoans
 GROUP BY BorrowerID;

Results:

| BORROWERID | LOANCOUNT | OVERLAPPINGCOUNT |
|------------|-----------|------------------|
|          1 |         3 |                0 |
|          2 |         1 |                0 |
|          3 |         3 |                2 |
|          4 |         2 |                2 |
|          5 |         2 |                2 |

I strongly suggest finding a way to use my first solution, as a loan table without a primary key is a, let's say "odd" design. However, if you really can't get there, use the second solution.

like image 138
Sebastian Meine Avatar answered Oct 19 '22 07:10

Sebastian Meine


I got it working but in a bit convoluted way. It first gets borrowers that don't meet criteria in the inner query and returns the rest. The inner query has 2 parts:

Get all overlapping borrowings not starting on the same day.

Get all borrowings starting on the same date.

select distinct BorrowerID from borrowings
where BorrowerID NOT IN

(
    select b1.BorrowerID from borrowings b1
    inner join borrowings b2
        on b1.BorrowerID = b2.BorrowerID
        and b1.StartDate < b2.StartDate
        and b1.DueDate > b2.StartDate

    union 

    select BorrowerID from borrowings
    group by BorrowerID, StartDate
    having count(*) > 1
)

I had to use 2 separate inner queries as your table doesn't have a unique identifier for each record and using b1.StartDate <= b2.StartDate as I should have makes a record join to itself. It would be good to have a separate identifier for each record.

like image 1
Szymon Avatar answered Oct 19 '22 08:10

Szymon