Is this correct?
SELECT *
FROM contract
JOIN team USING (name_team)
JOIN player USING(name_player)
WHERE name_team = ?
AND DATE_PART('YEAR',date_join)>= ?
AND DATE_PART('YEAR',date_leave)<= ?
My table contract
has the player name, team name and the dates when he joined and left the club.
I want to make a function listing all players that were on the team in specific years.
The above query doesn't seem to be working ...
You can do this by swapping the ranges if necessary up front. Then, you can detect overlap if the second range start is: less than or equal to the first range end (if ranges are inclusive, containing both the start and end times); or. less than (if ranges are inclusive of start and exclusive of end).
Use the PostgreSQL AGE() function to retrieve the interval between two timestamps or dates. This function takes two arguments: the first is the end date and the second is the start date.
In PostgreSQL, you can use the OVERLAPS operator to test for overlapping time periods. The function returns true when two time periods (defined by their endpoints) overlap, and false when they do not overlap.
Overlap = min(A2, B2) - max(A1, B1) + 1. In other words, the overlap of two integer intervals is a difference between the minimum value of the two upper boundaries and the maximum value of the two lower boundaries, plus 1.
The currently accepted answer does not answer the question. And it is wrong in principle. a BETWEEN x AND y
translates to:
a >= x AND a <= y
Including the upper bound, while people typically need to exclude it:
a >= x AND a < y
With dates you can easily adjust. For the year 2009 use '2009-12-31' as upper bound.
But it's not as simple with timestamps which allow fractional digits. Modern Postgres versions use an 8-byte integer internally to store up to 6 fractional seconds (µs resolution). Knowing this we could still make it work, but that's not intuitive and depends on an implementation detail. Bad idea.
Moreover, a BETWEEN x AND y
does not find overlapping ranges. We need:
b >= x AND a < y
And players that never left are not considered, yet.
Assuming the year 2009
, I'll rephrase the question without changing its meaning:
"Find all players of a given team who joined before 2010 and did not leave before 2009."
Basic query:
SELECT p.*
FROM team t
JOIN contract c USING (name_team)
JOIN player p USING (name_player)
WHERE t.name_team = ?
AND c.date_join < date '2010-01-01'
AND c.date_leave >= date '2009-01-01';
But there is more:
If referential integrity is enforced with FK constraints, the table team
itself is just noise in the query and can be removed.
While the same player can leave and rejoin the same team, we also need to fold possible duplicates, for instance with DISTINCT
.
And we may need to provide for a special case: players that never left. Assuming those players have NULL in date_leave
.
"A player that is not known to have left is assumed to be playing for the team to this day."
Refined query:
SELECT DISTINCT p.*
FROM contract c
JOIN player p USING (name_player)
WHERE c.name_team = ?
AND c.date_join < date '2010-01-01'
AND (c.date_leave >= date '2009-01-01' OR c.date_leave IS NULL);
Operator precedence works against us, AND
binds before OR
. We need parentheses.
Related answer with optimized DISTINCT
(if duplicates are common):
Typically, names of natural persons aren't unique and a surrogate primary key is used. But, obviously, name_player
is the primary key of player
. If all you need is player names we don't need the table player
in the query, either:
SELECT DISTINCT name_player
FROM contract
WHERE name_team = ?
AND date_join < date '2010-01-01'
AND (date_leave >= date '2009-01-01' OR date_leave IS NULL);
OVERLAPS
operatorThe manual:
OVERLAPS
automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open intervalstart <= time < end
, unlessstart
andend
are equal in which case it represents that single time instant.
To take care of potential NULL
values, COALESCE
seems easiest:
SELECT DISTINCT name_player
FROM contract
WHERE name_team = ?
AND (date_join, COALESCE(date_leave, CURRENT_DATE)) OVERLAPS
(date '2009-01-01', date '2010-01-01'); -- upper bound excluded
In Postgres 9.2 or later you can also operate with actual range types:
SELECT DISTINCT name_player
FROM contract
WHERE name_team = ?
AND daterange(date_join, date_leave) &&
daterange '[2009-01-01,2010-01-01)'; -- upper bound excluded
Range types add some overhead and occupy more space. 2 x date
= 8 bytes; 1 x daterange
= 14 bytes on disk or 17 bytes in RAM. But in combination with the overlap operator &&
the query can be supported with a GiST index.
Also, no need to special-case NULL values. NULL means "open range" in a range type - exactly what we need. The table definition doesn't even have to change: we can create the range type on the fly - and support the query with a matching expression index:
CREATE INDEX mv_stock_dr_idx ON mv_stock USING gist (daterange(date_join, date_leave));
Related:
Why not use between without the date part thing:
WHERE datefield BETWEEN '2009-10-10 00:00:00' AND '2009-10-11 00:00:00'
or something like that?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With