Aggregate function to detect trend in PostgreSQL

Tags:

I'm using a psql DB to store a data structure like so:

datapoint(userId, rank, timestamp)

where timestamp is the Unix Epoch milliseconds timestamp.

In this structure I store the rank of each user each day, so it's like:

UserId   Rank  Timestamp
1        1     1435366459
1        2     1435366458
1        3     1435366457
2        8     1435366456
2        6     1435366455
2        7     1435366454

So, in the sample data above, userId 1 its improving it's rank with each measurement, which means it has a positive trend, while userId 2 is dropping in rank, which means it has a negative trend.

What I need to do is to detect all users that have a positive trend based on the last N measurements.

719

asked Feb 26 '14 10:02

maephisto

1 Answers

One approach would be to perform a linear regression on the each user's rank, and check if the slope is positive or negative. Luckily, PostgreSQL has a builtin function to do that - regr_slope:

SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
FROM     my_table
GROUP BY user_id

This query gives you the basic functionality. Now, you can dress it up a bit with case expressions if you like:

SELECT user_id, 
       CASE WHEN slope > 0 THEN 'positive' 
            WHEN slope < 0 THEN 'negative' 
            ELSE 'steady' END AS trend
FROM   (SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
        FROM     my_table
        GROUP BY user_id) t

Edit:
Unfortunately, regr_slope doesn't have a built in way to handle "top N" type requirements, so this should be handled separately, e.g., by a subquery with row_number:

-- Decoration outer query
SELECT user_id, 
       CASE WHEN slope > 0 THEN 'positive' 
            WHEN slope < 0 THEN 'negative' 
            ELSE 'steady' END AS trend
FROM   (-- Inner query to calculate the slope
        SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
        FROM     (-- Inner query to get top N
                  SELECT user_id, rank1, 
                         ROW_NUMER() OVER (PARTITION BY user_id 
                                           ORDER BY timestamp1 DESC) AS rn
                  FROM   my_table) t
        WHERE    rn <= N -- Replace N with the number of rows you need
        GROUP BY user_id) t2

answered Sep 21 '22 02:09

Mureinik

Related questions
                            
                                What does the @ in front of a parameter name do?
                            
                                Oracle SQL unique constraint A to B, B to A
                            
                                Truncate function for SQLite
                            
                                How to get table name within a 'select' statement in SQL Server
                            
                                How can I use a Postgres EXCLUDE constraint to prevent inserting two primary rows?
                            
                                Select records where datetime is greater than the specified date
                            
                                SQL absolute value sum and iterate
                            
                                Speeding up checking of IP address membership in CIDR ranges, for large datasets
                            
                                Creating a dynamic query using IQueryable
                            
                                SQL query with "not exists" not working
                            
                                Column Name of PL/SQL Table-Type
                            
                                MySQL join two table with the maximum value on another field
                            
                                losing null values filtering sql query results using where
                            
                                Simulate MySQL records using inline data
                            
                                Update table using alias
                            
                                MySQL Unknown column in having clause
                            
                                Postgres Inner Join Select query returns error: column does not exist
                            
                                get the text of a stored procedure into a variable in SQL Server
                            
                                Android SQLite - Primary Key - Inserting into table
                            
                                Entity Framework SQL query not returning results

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Aggregate function to detect trend in PostgreSQL

Tags:

sql

database

select

postgresql

aggregate-functions

maephisto

People also ask

1 Answers

Mureinik

Recent Activity

Donate For Us