Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarize values across timeline in SQL

The Problem

I have a PostgreSQL database on which I am trying to summarize the revenue of a cash register over time. The cash register can either have status ACTIVE or INACTIVE, but I only want to summarize the earnings created when it was ACTIVE for a given period of time.

I have two tables; one that marks the revenue and one that marks the cash register status:

CREATE TABLE counters
(
  id bigserial NOT NULL,
  "timestamp" timestamp with time zone,
  total_revenue bigint,
  id_of_machine character varying(50),
  CONSTRAINT counters_pkey PRIMARY KEY (id)
)

CREATE TABLE machine_lifecycle_events
(
  id bigserial NOT NULL,
  event_type character varying(50),
  "timestamp" timestamp with time zone,
  id_of_affected_machine character varying(50),
  CONSTRAINT machine_lifecycle_events_pkey PRIMARY KEY (id)
)

A counters entry is added every 1 minute and total_revenue only increases. A machine_lifecycle_events entry is added every time the status of the machine changes.

I have added an image illustrating the problem. It is the revenue during the blue periods which should be summarized.

Timeline showing problem.

What I have tried so far

I have created a query which can give me the total revenue in a given instant:

SELECT total_revenue 
  FROM counters 
 WHERE timestamp < '2014-03-05 11:00:00' 
       AND id_of_machine='1' 
ORDER BY 
       timestamp desc 
 LIMIT 1

The questions

  1. How do I calculate the revenue earned between two timestamps?
  2. How do I determine the start and end timestamps of the blue periods when I have to compare the timestamps in machine_lifecycle_events with the input period?

Any ideas on how to attack this problem?

Update

Example data:

INSERT INTO counters VALUES
   (1,  '2014-03-01 00:00:00', 100,  '1')
 , (2,  '2014-03-01 12:00:00', 200,  '1')
 , (3,  '2014-03-02 00:00:00', 300,  '1')
 , (4,  '2014-03-02 12:00:00', 400,  '1')
 , (5,  '2014-03-03 00:00:00', 500,  '1')
 , (6,  '2014-03-03 12:00:00', 600,  '1')
 , (7,  '2014-03-04 00:00:00', 700,  '1')
 , (8,  '2014-03-04 12:00:00', 800,  '1')
 , (9,  '2014-03-05 00:00:00', 900,  '1')
 , (10, '2014-03-05 12:00:00', 1000, '1')
 , (11, '2014-03-06 00:00:00', 1100, '1')
 , (12, '2014-03-06 12:00:00', 1200, '1')
 , (13, '2014-03-07 00:00:00', 1300, '1')
 , (14, '2014-03-07 12:00:00', 1400, '1');

INSERT INTO machine_lifecycle_events VALUES
   (1, 'ACTIVE',   '2014-03-01 08:00:00', '1')
 , (2, 'INACTIVE', '2014-03-03 00:00:00', '1')
 , (3, 'ACTIVE',   '2014-03-05 00:00:00', '1')
 , (4, 'INACTIVE', '2014-03-06 12:00:00', '1');

SQL Fiddle with sample data.

Example query:
The revenue between '2014-03-02 08:00:00' and '2014-03-06 08:00:00' is 300. 100 for the first ACTIVE period, and 200 for the second ACTIVE period.

like image 905
uldall Avatar asked Mar 07 '14 15:03

uldall


1 Answers

DB design

To make my work easier I sanitized your DB design before I tackled the questions:

CREATE TEMP TABLE counter (
    id            bigserial PRIMARY KEY
  , ts            timestamp NOT NULL
  , total_revenue bigint NOT NULL
  , machine_id    int NOT NULL
);

CREATE TEMP TABLE machine_event (
    id            bigserial PRIMARY KEY
  , ts            timestamp NOT NULL
  , machine_id    int NOT NULL
  , status_active bool NOT NULL
);

Test case in the fiddle.

Major points

  • Using ts instead of "timestamp". Never use basic type names as column names.
  • Simplified & unified the name machine_id and made it out to be integer as it should be, instead of varchar(50).
  • event_type varchar(50) should be an integer foreign key, too, or an enum. Or even just a boolean for only active / inactive. Simplified to status_active bool.
  • Simplified and sanitized INSERT statements as well.

Answers

Assumptions

  • total_revenue only increases (per question).
  • Borders of the outer time frame are included.
  • Every "next" row per machine in machine_event has the opposite status_active.

1. How do I calculate the revenue earned between two timestamps?

WITH span AS (
   SELECT '2014-03-02 12:00'::timestamp AS s_from  -- start of time range
        , '2014-03-05 11:00'::timestamp AS s_to    -- end of time range
   )
SELECT machine_id, s.s_from, s.s_to
     , max(total_revenue) - min(total_revenue) AS earned
FROM   counter c
     , span s
WHERE  ts BETWEEN s_from AND s_to                  -- borders included!
AND    machine_id =  1
GROUP  BY 1,2,3;

2. How do I determine the start and end timestamps of the blue periods when I have to compare the timestamps in machine_event with the input period?

This query for all machines in the given time frame (span).
Add WHERE machine_id = 1 in the CTE cte to select a specific machine.

WITH span AS (
   SELECT '2014-03-02 08:00'::timestamp AS s_from  -- start of time range
        , '2014-03-06 08:00'::timestamp AS s_to    -- end of time range
   )
, cte AS (
   SELECT machine_id, ts, status_active, s_from
        , lead(ts, 1, s_to) OVER w AS period_end
        , first_value(ts)   OVER w AS first_ts
   FROM   span          s
   JOIN   machine_event e ON e.ts BETWEEN s.s_from AND s.s_to
   WINDOW w AS (PARTITION BY machine_id ORDER BY ts)
   )
SELECT machine_id, ts AS period_start, period_end -- start in time frame
FROM   cte
WHERE  status_active

UNION  ALL                             -- active start before time frame
SELECT machine_id, s_from, ts
FROM   cte
WHERE  NOT status_active
AND    ts =  first_ts
AND    ts <> s_from

UNION  ALL       -- active start before time frame, no end in time frame
SELECT machine_id, s_from, s_to
FROM  (
   SELECT DISTINCT ON (1)
          e.machine_id, e.status_active, s.s_from, s.s_to
   FROM   span          s
   JOIN   machine_event e ON e.ts < s.s_from  -- only from before time range
   LEFT   JOIN cte c USING (machine_id)
   WHERE  c.machine_id IS NULL                -- not in selected time range
   ORDER  BY e.machine_id, e.ts DESC          -- only the latest entry
   ) sub
WHERE  status_active -- only if active
ORDER  BY 1, 2;

Result is the list of blue periods in your image.
SQL Fiddle demonstrating both.

Recent similar question:
Sum of time difference between rows

like image 157
Erwin Brandstetter Avatar answered Oct 18 '22 21:10

Erwin Brandstetter