On PostgreSQL 9.4, I'm trying to achieve something I'll call "aggregate function" on some contiguous rows. Example:
Input data:
recipe prod1 prod2 timestamp
0 5 4 2015-07-02 08:10:34.357
0 2 7 2015-07-02 08:13:45.352
0 7 0 2015-07-02 08:16:22.098
1 3 2 2015-07-02 08:22:14.678
1 9 4 2015-07-02 08:22:56.123
2 2 6 2015-07-02 08:26:37.564
2 1 7 2015-07-02 08:27:33.109
2 0 8 2015-07-02 08:31:11.687
0 3 5 2015-07-02 08:40:01.345
1 4 2 2015-07-02 08:42:23.210
Desired output:
recipe prod1_sum prod2_avg timestamp_first timestamp_last
0 14 3.6666 2015-07-02 08:10:34.357 2015-07-02 08:16:22.098
1 12 3 2015-07-02 08:22:14.678 2015-07-02 08:22:56.123
2 3 7 2015-07-02 08:26:37.564 2015-07-02 08:31:11.687
0 3 5 2015-07-02 08:40:01.345 2015-07-02 08:40:01.345
1 4 2 2015-07-02 08:42:23.210 2015-07-02 08:42:23.210
Basically, one output line for each "group" of contiguous rows (when table is sorted on timestamp column) with the same "recipe" value. In the output, prod1_sum is the sum of prod1 in the "group", prod2_avg is average of prod2 in the same "group", and the 2 last columns are respectively the first and the last timestamps in the group. Obviously there are several distinct groups with the same "recipe" value, and I do want an output row for each of them.
At the moment, I've an ugly way of obtaining this based on several requests and a lot of data processing outside of the DB, which I really want to avoid, and which is not worth showing.
My problem really is the "grouping" of rows. I know how to create an aggregate function which would do what I want, if I could apply it to each group individually. I have looked into windows functions, but it seems this would group all values by recipe, not conforming with the "contiguous rows" principle I need to respect.
SQL aggregate functions accumulate data from multiple rows into a single summary row. The accumulated value is based on the values from the column passed as an argument. We can group the rows using a GROUP BY clause and further filter them using a HAVING clause.
GROUP BY in SQL, Explained And data aggregation is impossible without GROUP BY! Therefore, it is important to master GROUP BY to easily perform all types of data transformations and aggregations. In SQL, GROUP BY is used for data aggregation, using aggregate functions.
One method is to unpivot the data and then re-aggregate: select job, date, code, sum(hrs) as hrs from t cross apply (values (CC1_CODE, CC1_HRS), (CC2_CODE, CC2_HRS), . . . )
You cannot use aggregate functions in a WHERE clause or in a JOIN condition. However, a SELECT statement with aggregate functions in its SELECT list often includes a WHERE clause that restricts the rows to which the aggregate is applied.
You can use the following query:
SELECT recipe, SUM(prod1) AS prod1_sum,
AVG(prod2) AS prod2_avg,
MIN(timestamp) AS timestamp_first, MAX(timestamp) AS timestamp_last
FROM (
SELECT recipe, prod1, prod2, timestamp,
ROW_NUMBER() OVER (ORDER BY timestamp)
-
ROW_NUMBER() OVER (PARTITION BY recipe
ORDER BY timestamp) AS grp
FROM mytable ) t
GROUP BY recipe, grp
ORDER BY timestamp_first
The trick here is the usage of ROW_NUMBER
window function to identify islands of continuous recipe
values: grp
calculated field does exactly this.
Demo here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With