Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find overlapping date ranges in PostgreSQL

Is this correct?

SELECT * 
FROM   contract 
JOIN   team USING (name_team) 
JOIN   player USING(name_player) 
WHERE  name_team = ? 
AND    DATE_PART('YEAR',date_join)>= ? 
AND    DATE_PART('YEAR',date_leave)<= ?

My table contract has the player name, team name and the dates when he joined and left the club.
I want to make a function listing all players that were on the team in specific years.
The above query doesn't seem to be working ...

like image 758
aocferreira Avatar asked Dec 18 '10 23:12

aocferreira


People also ask

How do you calculate overlapping date range in SQL?

You can do this by swapping the ranges if necessary up front. Then, you can detect overlap if the second range start is: less than or equal to the first range end (if ranges are inclusive, containing both the start and end times); or. less than (if ranges are inclusive of start and exclusive of end).

How do you check if a date is between two dates in PostgreSQL?

Use the PostgreSQL AGE() function to retrieve the interval between two timestamps or dates. This function takes two arguments: the first is the end date and the second is the start date.

What is Postgres overlap?

In PostgreSQL, you can use the OVERLAPS operator to test for overlapping time periods. The function returns true when two time periods (defined by their endpoints) overlap, and false when they do not overlap.

How do you calculate overlap period?

Overlap = min(A2, B2) - max(A1, B1) + 1. In other words, the overlap of two integer intervals is a difference between the minimum value of the two upper boundaries and the maximum value of the two lower boundaries, plus 1.


2 Answers

The currently accepted answer does not answer the question. And it is wrong in principle. a BETWEEN x AND y translates to:

a >= x AND a <= y

Including the upper bound, while people typically need to exclude it:

a >= x AND a < y

With dates you can easily adjust. For the year 2009 use '2009-12-31' as upper bound.
But it's not as simple with timestamps which allow fractional digits. Modern Postgres versions use an 8-byte integer internally to store up to 6 fractional seconds (µs resolution). Knowing this we could still make it work, but that's not intuitive and depends on an implementation detail. Bad idea.

Moreover, a BETWEEN x AND y does not find overlapping ranges. We need:

b >= x AND a < y

And players that never left are not considered, yet.

Proper answer

Assuming the year 2009, I'll rephrase the question without changing its meaning:

"Find all players of a given team who joined before 2010 and did not leave before 2009."

Basic query:

SELECT p.*
FROM   team     t
JOIN   contract c USING (name_team) 
JOIN   player   p USING (name_player) 
WHERE  t.name_team = ? 
AND    c.date_join  <  date '2010-01-01'
AND    c.date_leave >= date '2009-01-01';

But there is more:

If referential integrity is enforced with FK constraints, the table team itself is just noise in the query and can be removed.

While the same player can leave and rejoin the same team, we also need to fold possible duplicates, for instance with DISTINCT.

And we may need to provide for a special case: players that never left. Assuming those players have NULL in date_leave.

"A player that is not known to have left is assumed to be playing for the team to this day."

Refined query:

SELECT DISTINCT p.* 
FROM   contract c
JOIN   player   p USING (name_player) 
WHERE  c.name_team = ? 
AND    c.date_join  <  date '2010-01-01'
AND   (c.date_leave >= date '2009-01-01' OR c.date_leave IS NULL);

Operator precedence works against us, AND binds before OR. We need parentheses.

Related answer with optimized DISTINCT (if duplicates are common):

  • Many to Many Table - Performance is bad

Typically, names of natural persons aren't unique and a surrogate primary key is used. But, obviously, name_player is the primary key of player. If all you need is player names we don't need the table player in the query, either:

SELECT DISTINCT name_player 
FROM   contract
WHERE  name_team = ? 
AND    date_join  <  date '2010-01-01'
AND   (date_leave >= date '2009-01-01' OR date_leave IS NULL);

SQL OVERLAPS operator

The manual:

OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant.

To take care of potential NULL values, COALESCE seems easiest:

SELECT DISTINCT name_player 
FROM   contract
WHERE  name_team = ? 
AND    (date_join, COALESCE(date_leave, CURRENT_DATE)) OVERLAPS
       (date '2009-01-01', date '2010-01-01');  -- upper bound excluded

Range type with index support

In Postgres 9.2 or later you can also operate with actual range types:

SELECT DISTINCT name_player 
FROM   contract
WHERE  name_team = ? 
AND    daterange(date_join, date_leave) &&
       daterange '[2009-01-01,2010-01-01)';  -- upper bound excluded

Range types add some overhead and occupy more space. 2 x date = 8 bytes; 1 x daterange = 14 bytes on disk or 17 bytes in RAM. But in combination with the overlap operator && the query can be supported with a GiST index.

Also, no need to special-case NULL values. NULL means "open range" in a range type - exactly what we need. The table definition doesn't even have to change: we can create the range type on the fly - and support the query with a matching expression index:

CREATE INDEX mv_stock_dr_idx ON mv_stock USING gist (daterange(date_join, date_leave));

Related:

  • Average stock history table
like image 51
Erwin Brandstetter Avatar answered Oct 05 '22 07:10

Erwin Brandstetter


Why not use between without the date part thing:

WHERE datefield BETWEEN '2009-10-10 00:00:00' AND '2009-10-11 00:00:00'

or something like that?

like image 29
Scott Marlowe Avatar answered Oct 05 '22 07:10

Scott Marlowe