Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql Writing max() Window function with multiple partition expressions?

I am trying to get the max value of column A ("original_list_price") over windows defined by 2 columns (namely - a unique identifier, called "address_token", and a date field, called "list_date"). I.e. I would like to know the max "original_list_price" of rows with both the same address_token AND list_date.

E.g.:

SELECT 
address_token, list_date, original_list_price, 
max(original_list_price) OVER (PARTITION BY address_token, list_date) as max_list_price
FROM table1  

The query already takes >10 minutes when I use just 1 expression in the PARTITION (e.g. using address_token only, nothing after that). Sometimes the query times out. (I use Mode Analytics and get this error: An I/O error occurred while sending to the backend) So my questions are:

1) Will the Window function with multiple PARTITION BY expressions work?

2) Any other way to achieve my desired result?

3) Any way to make Windows functions, especially the Partition part run faster? e.g. use certain data types over others, try to avoid long alphanumeric string identifiers?

Thank you!

like image 360
Laura D Avatar asked Oct 18 '22 22:10

Laura D


1 Answers

The complexity of the window functions partitioning clause should not have a big impact on performance. Do realize that your query is returning all the rows in the table, so there might be a very large result set.

Window functions should be able to take advantage of indexes. For this query:

SELECT address_token, list_date, original_list_price, 
       max(original_list_price) OVER (PARTITION BY address_token, list_date) as max_list_price
FROM table1;

You want an index on table1(address_token, list_date, original_list_price).

You could try writing the query as:

select t1.*,
       (select max(t2.original_list_price)
        from table1 t2
        where t2.address_token = t1.address_token and t2.list_date = t1.list_date
       ) as max_list_price
from table1 t1;

This should return results more quickly, because it doesn't have to calculate the window function value first (for all rows) before returning values.

like image 128
Gordon Linoff Avatar answered Oct 27 '22 11:10

Gordon Linoff