Find and sum date ranges with overlapping records in postgresql

Tags:

ruby-on-rails

I have a large dataset where I want to sum a count where records have overlapping time. For example, given the data

[
  {"id": 1, "name": 'A', "start": '2018-12-10 00:00:00', "end": '2018-12-20 00:00:00', count: 34},
  {"id": 2, "name": 'B', "start": '2018-12-16 00:00:00', "end": '2018-12-27 00:00:00', count: 19},
  {"id": 3, "name": 'C', "start": '2018-12-16 00:00:00', "end": '2018-12-20 00:00:00', count: 56},
  {"id": 4, "name": 'D', "start": '2018-12-25 00:00:00', "end": '2018-12-30 00:00:00', count: 43}
]

enter image description here

You can see there are 2 periods where activities overlap. I want to return the total count of these 'overlaps' based on the activities involved in overlap. So the above would output something like:

Click to copy

[
  {start:'2018-12-16', end: '2018-12-20', overlap_ids:[1,2,3], total_count: 109},
  {start:'2018-12-25', end: '2018-12-27', overlap_ids:[2,4], total_count: 62},
]

The question is, how to go about generating this via a postgres query? Was looking into generate_series then working out what activity falls into each interval, but thats not quite right as the data is continuous - I really need to identify the exact overlapping time then do a sum on the overlapping activities.

EDIT Have added another example. As @SRack pointed out, since A,B,C overlap, this means B,C A,B and A,C also overlap. This doesn’t matter since the output I’m looking for is an array of date ranges that contain overlapping activities rather than all the unique combinations of overlaps. Also note the dates are timestamps, so will have millisecond precision and won’t necessarily all be at 00:00:00. If it helps, there would probably be a WHERE condition on the total count. For example only want to see results where total count > 100 enter image description here

447

asked Jan 25 '19 16:01

Dave

1 Answers

demo:db<>fiddle (uses the old data set with the overlapping A-B-part)

Disclaimer: This works for day intervals not for timestamps. The requirement for ts came later.

Click to copy

SELECT
    s.acts,
    s.sum,
    MIN(a.start) as start,
    MAX(a.end) as end
FROM (
    SELECT DISTINCT ON (acts)
        array_agg(name) as acts,
        SUM(count)
    FROM
        activities, generate_series(start, "end", interval '1 day') gs
    GROUP BY gs
    HAVING cardinality(array_agg(name)) > 1
) s
JOIN activities a
ON a.name = ANY(s.acts)
GROUP BY s.acts, s.sum

generate_series generates all dates between start and end. So every date an activity exists gets one row with the specific count
Grouping all dates, aggregating all existing activities and sum of their counts
HAVING filters out the dates where only one activity exist
Because there are different days with the same activities we only need one representant: Filter all duplicates with DISTINCT ON
Join this result against the original table to get the start and end. (note that "end" is a reserved word in Postgres, you should better find another column name!). It was more comfortable to lose them before but its possible to get these data within the subquery.
Group this join to get the most early and latest date of each interval.

Here's a version for timestamps:

demo:db<>fiddle

Click to copy

WITH timeslots AS (
    SELECT * FROM (
        SELECT
            tsrange(timepoint, lead(timepoint) OVER (ORDER BY timepoint)),
            lead(timepoint) OVER (ORDER BY timepoint)     -- 2
        FROM (
            SELECT 
                unnest(ARRAY[start, "end"]) as timepoint  -- 1 
            FROM
                activities
            ORDER BY timepoint
        ) s
    )s  WHERE lead IS NOT NULL                            -- 3
)
SELECT 
    GREATEST(MAX(start), lower(tsrange)),                 -- 6
    LEAST(MIN("end"), upper(tsrange)),
    array_agg(name),                                      -- 5
    sum(count)
FROM 
    timeslots t
JOIN activities a
ON t.tsrange && tsrange(a.start, a.end)                   -- 4
GROUP BY tsrange
HAVING cardinality(array_agg(name)) > 1

The main idea is to identify possible time slots. So I take every known time (both start and end) and put them into a sorted list. So I can take the first tow known times (17:00 from start A and 18:00 from start B) and check which interval is in it. Then I check it for the 2nd and 3rd, then for 3rd an 4th and so on.

In the first timeslot only A fits. In the second from 18-19 also B is fitting. In the next slot 19-20 also C, from 20 to 20:30 A isn't fitting anymore, only B and C. The next one is 20:30-22 where only B fits, finally 22-23 D is added to B and last but not least only D fits into 23-23:30.

So I take this time list and join it agains the activities table where the intervals intersect. After that its only a grouping by time slot and sum up your count.

this puts both ts of a row into one array whose elements are expanded into one row per element with unnest. So I get all times into one column which can be simply ordered
using the lead window function allows to take the value of the next row into the current one. So I can create a timestamp range out of these both values with tsrange
This filter is necessary because the last row has no "next value". This creates a NULL value which is interpreted by tsrange as infinity. So this would create an incredible wrong time slot. So we need to filter this row out.
Join the time slots against the original table. The && operator checks if two range types overlap.
Grouping by single time slots, aggregating the names and the count. Filter out the time slots with only one activity by using the HAVING clause
A little bit tricky to get the right start and end points. So the start points are either the maximum of the activity start or the beginning of a time slot (which can be get using lower). E.g. Take the 20-20:30 slot: It begins 20h but neither B nor C has its starting point there. Similar the end time.

112

answered Oct 22 '22 23:10

S-Man

Related questions
                            
                                To convert Excel into CSV efficiently in ruby
                            
                                Undocumented ActiveRecord bang methods
                            
                                Singleton in scope of a request in rails [closed]
                            
                                Creating multiple csv-files and download all in one zip-archive using rails
                            
                                Stripe - No API key provided?
                            
                                AWS S3 in rails - how to set the s3_signature_version parameter
                            
                                AWS OpsWorks Environment variables not working
                            
                                Rails 4.2 - how to fix ascii code in CSV exporting without gem 'iconv'?
                            
                                Rails Geocoder Testing with rspec
                            
                                Session not destroyed when closing browser - RailsTutorial.org
                            
                                Rails STI and multi-level inheritance queries
                            
                                What is "USER INSTALLATION DIRECTORY" in rubyGems env
                            
                                Rspec 'cannot load such file'
                            
                                Adding a column before another one in Rails
                            
                                Indexing on nested form with multiple `fields_for`
                            
                                Rails multiline debug in byebug or how to rescue in single line
                            
                                Dynamically extend Virtus instance attributes
                            
                                Is the Spotify search API no longer available without authentication?
                            
                                ActiveStorage checking if file exists is slow
                            
                                Upgraded Rails 4 to Rail 5- now getting "NoMethodError: undefined method `original_exception' for #<ActionView::Template::Error:0x007f243ecd5d48>"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find and sum date ranges with overlapping records in postgresql

Tags:

postgresql

ruby-on-rails

Dave

People also ask

1 Answers

S-Man

Recent Activity

Donate For Us