Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identifying trend with SQL query

Tags:

sql

trend

I have a table (let's call it Data) with a set of object IDs, numeric values and dates. I would like to identify the objects whose values had a positive trend over the last X minutes (say, an hour).

Example data:

entity_id | value | date

1234      | 15    | 2014-01-02 11:30:00

5689      | 21    | 2014-01-02 11:31:00

1234      | 16    | 2014-01-02 11:31:00

I tried looking at similar questions, but didnt find anything that helps unfortunately...

like image 704
Dan Markhasin Avatar asked Jan 02 '14 12:01

Dan Markhasin


People also ask

How do I track changes in SQL database?

Right click on the table you want to track changes. Click Properties, click Change Tracking, then in the right pane set Change Tracking to TRUE.

Can you use SQL to analyze data?

For many, SQL is the "meat and potatoes" of data analysis—it's used for accessing, cleaning, and analyzing data that's stored in databases. It's very easy to learn, yet it's employed by the world's largest companies to solve incredibly challenging problems.

How do you find outliers in SQL?

A fast way to identify outliers is to sort the relevant values in both ascending and descending order. This allows you to quickly skim through the highest and lowest values. If you have a sense of what you are expecting from your data, this can help you quickly identify any unexpected values.

Can you do regressions in SQL?

Simple Linear Regression is handy for the SQL Programmer in making a prediction of a linear trend and giving a figure for the level probability for the prediction, and what is more, they are easy to do with the aggregation that is built into SQL.


1 Answers

You inspired me to go and implement linear regression in SQL Server. This could be modified for MySQL/Oracle/Whatever without too much trouble. It's the mathematically best way of determining the trend over the hour for each entity_id and it will select out only the ones with a positive trend.

It implements the formula for calculating B1hat listed here: https://en.wikipedia.org/wiki/Regression_analysis#Linear_regression

create table #temp
(
    entity_id int,
    value int,
    [date] datetime
)

insert into #temp (entity_id, value, [date])
values
(1,10,'20140102 07:00:00 AM'),
(1,20,'20140102 07:15:00 AM'),
(1,30,'20140102 07:30:00 AM'),
(2,50,'20140102 07:00:00 AM'),
(2,20,'20140102 07:47:00 AM'),
(3,40,'20140102 07:00:00 AM'),
(3,40,'20140102 07:52:00 AM')

select entity_id, 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar)) as Beta
from
(
    select entity_id,
        avg(value) over(partition by entity_id) as ybar,
        value as y,
        avg(datediff(second,'20140102 07:00:00 AM',[date])) over(partition by entity_id) as xbar,
        datediff(second,'20140102 07:00:00 AM',[date]) as x
    from #temp
    where [date]>='20140102 07:00:00 AM' and [date]<'20140102 08:00:00 AM'
) as Calcs
group by entity_id
having 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar))>0
like image 183
John Chrysostom Avatar answered Sep 18 '22 15:09

John Chrysostom