Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL - consecutive day with min/max - tables contains multiple equal days

Tags:

sql

postgresql

I'm trying and googling since a few hours, but got no solution that works. Here is my issue:

I have a table with following structure: id SERIAL, datum DATE, otherinfo VARCHAR(50)

The data could have equal date entries:

id  datum      otherinfo
1   2019-12-28 testdata1
2   2019-12-28 testdata2
3   2019-12-29 testdata3
4   2019-12-29 testdata4
5   2019-12-31 testdata5-begin longest consecutive days
6   2019-12-31 testdata6
7   2020-01-01 testdata7
8   2020-01-01 testdata8
9   2020-01-02 testdata9
10  2020-01-03 testdata10
11  2020-01-04 testdata11
12  2020-01-04 testdata12
13  2020-01-05 testdata13-end longest consecutive days
14  2020-01-22 testdata14
15  2020-01-29 testdata15
16  2020-01-30 testdata16

I'm interested to get the consecutive days with the start and end date. An output like this:

count | date MIN  | date MAX
6       2019-12-31  2020-01-05
2       2019-12-28  2019-12-29
2       2020-01-29  2020-01-30

I found a few ideas how to solve here on Stackoverflow, but it seems that it always conflicts with multiple equal date entries.

Most success I had with following SQL query:

SELECT COUNT(*) -1 "count", MAX(datum), MIN(datum) FROM (SELECT *, date(datum) - row_number() OVER (PARTITION BY datum ORDER BY date(datum)) * INTERVAL '1 day' "filter" FROM table ) t1 GROUP BY filter HAVING COUNT(*) -1 > 0 ORDER BY count DESC

Sadly it gives wrong counts of consecutive days and the counted days even don't match with the start / end dates.

Many thanks for ideas

Martin

like image 282
marty44 Avatar asked Sep 16 '25 17:09

marty44


2 Answers

This is a Gaps & Islands problem. You can use the traditional solution:

select
  (max(datum) - min(datum)) + 1 as cnt,
  min(datum) as date_min,
  max(datum) as date_max
from (
  select x.*, sum(i) over(order by datum) as g
  from (
    select t.*,
      case when datum > lag(datum) over(order by datum) + 1 
           then 1 else 0 end as i
    from t
  ) x
) y
group by g
like image 157
The Impaler Avatar answered Sep 19 '25 07:09

The Impaler


This is called a gaps & islands problem. A typical way of solving this is numbering the rows (here with DENSE_RANK as the dates are not yet unique) and comparing these numbers to a relative position. As you are dealing with dates, our position is the number of days from a fixed date.

select count(*), min(datum), max(datum)
from
(
  select distinct 
    datum,
    datum - date '1900-01-01' - dense_rank() over (order by datum) as grp
  from mytable
) grouped
group by grp
order by grp;

Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f89e579db31ffd956fdea5d437625b68

If you find it more readable, you can add a step: first make the rows distinct, then run through the rows and get the diff of position and row number (which you can then get with ROW_NUMBER instead of DENSE_RANK), then aggregate.

like image 28
Thorsten Kettner Avatar answered Sep 19 '25 07:09

Thorsten Kettner