Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Closest record for a series of dates

I know that to get the closest record prior to a date I can use the query:

select * 
from results 
where resulttime = (select max(resulttime) 
                    from results 
                    where some_id = 15 
                      and resulttime < '2012-07-27');

But I need to do this for a series of days, so that I know the closest record for each day. Any ideas?

The series of days would be generated by generate_sequence().

The closest prior record may be in a prior day to what we want the value for, but still need to be returned.

like image 856
root Avatar asked Jul 26 '12 08:07

root


2 Answers

Should be simplest & fastest with a LEFT JOIN and DISTINCT ON:

WITH x(search_ts) AS (
    VALUES
     ('2012-07-26 20:31:29'::timestamp)              -- search timestamps
    ,('2012-05-14 19:38:21')
    ,('2012-05-13 22:24:10')
    )
SELECT DISTINCT ON (x.search_ts)
       x.search_ts, r.id, r.resulttime
FROM   x
LEFT   JOIN results r ON r.resulttime <= x.search_ts -- smaller or same
-- WHERE some_id = 15                                -- some condition?
ORDER  BY x.search_ts, r.resulttime DESC;

Result (dummy values):

search_ts           | id     | resulttime
--------------------+--------+----------------
2012-05-13 22:24:10 | 404643 | 2012-05-13 22:24:10
2012-05-14 19:38:21 | 404643 | 2012-05-13 22:24:10
2012-07-26 20:31:29 | 219822 | 2012-07-25 19:47:44

I use a CTE to provide the values, could be a table or function or unnested array or a set generated with generate_series() something else as well. (Did you mean generate_series() by "generate_sequence()"?)

First I JOIN the search timestamps to all rows in the table with earlier or equal resulttime. I use LEFT JOIN instead of JOIN so that search timestamps are not dropped when there is no prior resulttime in the table at all.

With DISTINCT ON (x.search_ts) in combination with ORDER BY x.search_ts, r.resulttime DESC we get the greatest (or one of the equally greatest) resulttime that is smaller or equal to each search timestamp.

like image 144
Erwin Brandstetter Avatar answered Oct 21 '22 08:10

Erwin Brandstetter


A simple subquery to identify the largest date within a day boundary from a time series similar to your example would be something like this

SELECT max(resulttime), date_trunc('days',resulttime) FROM results GROUP BY 2;

that is pinning the timestamp to just the day component using the date_trunc function, and then collecting the largest timestamp for each truncated date, using GROUP BY max()

like image 25
cms Avatar answered Oct 21 '22 09:10

cms