Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres windowing (determine contiguous days)

Using Postgres 9.3, I'm trying to count the number of contiguous days of a certain weather type. If we assume we have a regular time series and weather report:

date|weather
"2016-02-01";"Sunny"
"2016-02-02";"Cloudy"
"2016-02-03";"Snow"
"2016-02-04";"Snow"
"2016-02-05";"Cloudy"
"2016-02-06";"Sunny"
"2016-02-07";"Sunny"
"2016-02-08";"Sunny"
"2016-02-09";"Snow"
"2016-02-10";"Snow"

I want something count the contiguous days of the same weather. The results should look something like this:

date|weather|contiguous_days 
"2016-02-01";"Sunny";1
"2016-02-02";"Cloudy";1
"2016-02-03";"Snow";1
"2016-02-04";"Snow";2
"2016-02-05";"Cloudy";1
"2016-02-06";"Sunny";1
"2016-02-07";"Sunny";2
"2016-02-08";"Sunny";3
"2016-02-09";"Snow";1
"2016-02-10";"Snow";2

I've been banging my head on this for a while trying to use windowing functions. At first, it seems like it should be no-brainer, but then I found out its much harder than expected.

Here is what I've tried...

Select date, weather, Row_Number() Over (partition by weather order by date)
  from t_weather

Would it be better just easier to compare the current row to the next? How would you do that while maintaining a count? Any thoughts, ideas, or even solutions would be helpful! -Kip

like image 392
Kip Avatar asked Feb 18 '16 20:02

Kip


4 Answers

You need to identify the contiguous where the weather is the same. You can do this by adding a grouping identifier. There is a simple method: subtract a sequence of increasing numbers from the dates and it is constant for contiguous dates.

One you have the grouping, the rest is row_number():

Select date, weather,
       Row_Number() Over (partition by weather, grp order by date)
from (select w.*, 
             (date - row_number() over (partition by weather order by date) * interval '1 day') as grp
      from t_weather w
     ) w;

The SQL Fiddle is here.

like image 56
Gordon Linoff Avatar answered Oct 17 '22 01:10

Gordon Linoff


I'm not sure what the query engine is going to do when scanning multiple times across the same data set (kinda like calculating area under a curve), but this works...

WITH v(date, weather) AS (
VALUES 
('2016-02-01'::date,'Sunny'::text),
('2016-02-02','Cloudy'),
('2016-02-03','Snow'),
('2016-02-04','Snow'),
('2016-02-05','Cloudy'),
('2016-02-06','Sunny'),
('2016-02-07','Sunny'),
('2016-02-08','Sunny'),
('2016-02-09','Snow'),
('2016-02-10','Snow') ),
changes AS (
SELECT date, 
    weather, 
    CASE WHEN lag(weather) OVER () = weather THEN 1 ELSE 0 END change
FROM v)
SELECT date
    , weather
    ,(SELECT count(weather) -- number of times the weather didn't change
      FROM changes v2 
      WHERE v2.date <= v1.date AND v2.weather = v1.weather
        AND v2.date >= ( -- bounded between changes of weather
            SELECT max(date) 
            FROM changes v3 
            WHERE change = 0 
            AND v3.weather = v1.weather 
            AND v3.date <= v1.date)  --<-- here's the expensive part
    ) curve
FROM changes v1
like image 28
Kirk Roybal Avatar answered Oct 17 '22 00:10

Kirk Roybal


Here is another approach based off of this answer.

First we add a change column that is 1 or 0 depending on whether the weather is different or not from the previous day.
Then we introduce a group_nr column by summing the change over an order by date. This produces a unique group number for each sequence of consecutive same-weather days since the sum is only incremented on the first day of each sequence.
Finally we do a row_number() over (partition by group_nr order by date) to produce the running count per group.

select date, weather, row_number() over (partition by group_nr order by date)
from (
  select *, sum(change) over (order by date) as group_nr
  from (
    select *, (weather != lag(weather,1,'') over (order by date))::int as change
    from tmp_weather
  ) t1
) t2;

sqlfiddle (uses equivalent WITH syntax)

like image 36
Kenney Avatar answered Oct 16 '22 23:10

Kenney


You can accomplish this with a recursive CTE as follows:

WITH RECURSIVE CTE_ConsecutiveDays AS
(
    SELECT
        my_date,
        weather,
        1 AS consecutive_days
    FROM My_Table T
    WHERE
        NOT EXISTS (SELECT * FROM My_Table T2 WHERE T2.my_date = T.my_date - INTERVAL '1 day' AND T2.weather = T.weather)
    UNION ALL
    SELECT
        T.my_date,
        T.weather,
        CD.consecutive_days + 1
    FROM
        CTE_ConsecutiveDays CD
    INNER JOIN My_Table T ON
        T.my_date = CD.my_date + INTERVAL '1 day' AND
        T.weather = CD.weather
)
SELECT *
FROM CTE_ConsecutiveDays
ORDER BY my_date;

Here's the SQL Fiddle to test: http://www.sqlfiddle.com/#!15/383e5/3

like image 42
Tom H Avatar answered Oct 17 '22 01:10

Tom H