I'm trying and googling since a few hours, but got no solution that works. Here is my issue:
I have a table with following structure: id SERIAL, datum DATE, otherinfo VARCHAR(50)
The data could have equal date entries:
id  datum      otherinfo
1   2019-12-28 testdata1
2   2019-12-28 testdata2
3   2019-12-29 testdata3
4   2019-12-29 testdata4
5   2019-12-31 testdata5-begin longest consecutive days
6   2019-12-31 testdata6
7   2020-01-01 testdata7
8   2020-01-01 testdata8
9   2020-01-02 testdata9
10  2020-01-03 testdata10
11  2020-01-04 testdata11
12  2020-01-04 testdata12
13  2020-01-05 testdata13-end longest consecutive days
14  2020-01-22 testdata14
15  2020-01-29 testdata15
16  2020-01-30 testdata16
I'm interested to get the consecutive days with the start and end date. An output like this:
count | date MIN  | date MAX
6       2019-12-31  2020-01-05
2       2019-12-28  2019-12-29
2       2020-01-29  2020-01-30
I found a few ideas how to solve here on Stackoverflow, but it seems that it always conflicts with multiple equal date entries.
Most success I had with following SQL query:
SELECT COUNT(*) -1 "count", MAX(datum), MIN(datum) FROM (SELECT *, date(datum) - row_number() OVER (PARTITION BY datum ORDER BY date(datum)) * INTERVAL '1 day' "filter" FROM table ) t1 GROUP BY filter HAVING COUNT(*) -1 > 0 ORDER BY count DESC
Sadly it gives wrong counts of consecutive days and the counted days even don't match with the start / end dates.
Many thanks for ideas
Martin
This is a Gaps & Islands problem. You can use the traditional solution:
select
  (max(datum) - min(datum)) + 1 as cnt,
  min(datum) as date_min,
  max(datum) as date_max
from (
  select x.*, sum(i) over(order by datum) as g
  from (
    select t.*,
      case when datum > lag(datum) over(order by datum) + 1 
           then 1 else 0 end as i
    from t
  ) x
) y
group by g
                        This is called a gaps & islands problem. A typical way of solving this is numbering the rows (here with DENSE_RANK as the dates are not yet unique) and comparing these numbers to a relative position. As you are dealing with dates, our position is the number of days from a fixed date.
select count(*), min(datum), max(datum)
from
(
  select distinct 
    datum,
    datum - date '1900-01-01' - dense_rank() over (order by datum) as grp
  from mytable
) grouped
group by grp
order by grp;
Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f89e579db31ffd956fdea5d437625b68
If you find it more readable, you can add a step: first make the rows distinct, then run through the rows and get the diff of position and row number (which you can then get with ROW_NUMBER instead of DENSE_RANK), then aggregate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With