How to optimize SQL query with window functions

Tags:

This question is related to this one. I have table which contains power values for devices and I need to calculate power consumption for given time span and return 10 most power consuming devices. I have generated 192 devices and 7742208 measurement records (40324 for each). This is roughly how much records devices would produce in one month.

For this amount of data my current query takes over 40s to execute which is too much because time span and amount of devices and measurements could be much higher. Should I try to solve this with different approach than lag() OVER PARTITION and what other optimizations can be made? I would really appreciate suggestions with code examples.

PostgreSQL version 9.4

Query with example values:

SELECT
  t.device_id,
  sum(len_y*(extract(epoch from len_x))) AS total_consumption
FROM (
    SELECT
      m.id,
      m.device_id,
      m.power_total,
      m.created_at,
      m.power_total+lag(m.power_total) OVER (
        PARTITION BY device_id
        ORDER BY m.created_at
      ) AS len_y,
      m.created_at-lag(m.created_at) OVER (
        PARTITION BY device_id
        ORDER BY m.created_at
      ) AS len_x
    FROM
      measurements AS m
  WHERE m.created_at BETWEEN '2015-07-30 13:05:24.403552+00'::timestamp
    AND '2015-08-27 12:34:59.826837+00'::timestamp
) AS t
GROUP BY t.device_id
ORDER BY total_consumption
DESC LIMIT 10;

Table info:

    Column    |           Type           |                         Modifiers
--------------+--------------------------+----------------------------------------------------------
 id           | integer                  | not null default nextval('measurements_id_seq'::regclass)
 created_at   | timestamp with time zone | default timezone('utc'::text, now())
 power_total  | real                     |
 device_id    | integer                  | not null
Indexes:
    "measurements_pkey" PRIMARY KEY, btree (id)
    "measurements_device_id_idx" btree (device_id)
    "measurements_created_at_idx" btree (created_at)
Foreign-key constraints:
    "measurements_device_id_fkey" FOREIGN KEY (device_id) REFERENCES devices(id)

Query plan:

Limit  (cost=1317403.25..1317403.27 rows=10 width=24) (actual time=41077.091..41077.094 rows=10 loops=1)
->  Sort  (cost=1317403.25..1317403.73 rows=192 width=24) (actual time=41077.089..41077.092 rows=10 loops=1)
Sort Key: (sum((((m.power_total + lag(m.power_total) OVER (?))) * date_part('epoch'::text, ((m.created_at - lag(m.created_at) OVER (?)))))))
Sort Method: top-N heapsort  Memory: 25kB
->  GroupAggregate  (cost=1041700.67..1317399.10 rows=192 width=24) (actual time=25361.013..41076.562 rows=192 loops=1)
Group Key: m.device_id
->  WindowAgg  (cost=1041700.67..1201314.44 rows=5804137 width=20) (actual time=25291.797..37839.727 rows=7742208 loops=1)
->  Sort  (cost=1041700.67..1056211.02 rows=5804137 width=20) (actual time=25291.746..30699.993 rows=7742208 loops=1)
Sort Key: m.device_id, m.created_at
Sort Method: external merge  Disk: 257344kB
->  Seq Scan on measurements m  (cost=0.00..151582.05 rows=5804137 width=20) (actual time=0.333..5112.851 rows=7742208 loops=1)
Filter: ((created_at >= '2015-07-30 13:05:24.403552'::timestamp without time zone) AND (created_at <= '2015-08-27 12:34:59.826837'::timestamp without time zone))

Planning time: 0.351 ms
Execution time: 41114.883 ms

Query to generate test table and data:

CREATE TABLE measurements (
    id          serial primary key,
    device_id   integer,
    power_total real,
    created_at  timestamp
);

INSERT INTO measurements(
    device_id,
    created_at,
    power_total
  )
SELECT
  device_id,
  now() + (i * interval '1 minute'),
  random()*(50-1)+1
FROM (
  SELECT
    DISTINCT(device_id),
    generate_series(0,10) AS i
 FROM (
  SELECT
    generate_series(1,5) AS device_id
  ) AS dev_ids
) AS gen_table;

326

asked Aug 27 '15 16:08

Henri Koski

1 Answers

I would try to move some part of the calculations into the phase of row insertion.

Add a new column:

alter table measurements add consumption real;

Update the column:

with m1 as (
    select
        id, power_total, created_at,
        lag(power_total) over (partition by device_id order by created_at) prev_power_total,
        lag(created_at) over (partition by device_id order by created_at) prev_created_at
    from measurements
    )
update measurements m2
set consumption = 
    (m1.power_total+ m1.prev_power_total)*
    extract(epoch from m1.created_at- m1.prev_created_at)
from m1
where m2.id = m1.id;

Create a trigger:

create or replace function before_insert_on_measurements()
returns trigger language plpgsql
as $$
declare
    rec record;
begin
    select power_total, created_at into rec
    from measurements
    where device_id = new.device_id
    order by created_at desc
    limit 1;
    new.consumption:= 
        (new.power_total+ rec.power_total)*
        extract(epoch from new.created_at- rec.created_at);
    return new;
end $$;

create trigger before_insert_on_measurements
before insert on measurements
for each row execute procedure before_insert_on_measurements();

The query:

select device_id, sum(consumption) total_consumption
from measurements
-- where conditions
group by 1
order by 1

answered Oct 15 '22 20:10

klin

Related questions
                            
                                OLE DB provider "Search.CollatorDSO" returns "Command was not prepared"
                            
                                When combining multiple queries into one how do you sum columns together
                            
                                C# equivalent of DecryptByPassPhrase
                            
                                ORDER BY upper(...) with a UNION giving me problems
                            
                                SQL str() vs round() function
                            
                                Audit Trigger, Insert Blob As LongText
                            
                                better way to condition preference sql
                            
                                execute .sql file in bash with params
                            
                                Update a column value for 500 million rows in Interval Partitioned table
                            
                                MySQL byte-for-byte comparison, which is faster? binary vs bin_collate
                            
                                Reduce the use of AND operator in Oracle while comparing multiple NULL values in different columns
                            
                                SQL Count sickness days
                            
                                SUM is slower than loop
                            
                                Simplify SQL select statement
                            
                                Whats the "PARALLEL" equivalent in SQL Server
                            
                                How to convert °& to readable date?
                            
                                Single Row Table in Entity Framework 6
                            
                                Database design for getting unread article in mysql (over 100M row table)
                            
                                How to avoid single quotes in sequelize query with replacements?
                            
                                does int(11) and int(6) cause index issue?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to optimize SQL query with window functions

Tags:

sql

optimization

postgresql

query-optimization

Henri Koski

People also ask

1 Answers

klin

Recent Activity

Donate For Us