I have some data in a sql database and I'd like to calculate the slope. The data has this layout:
Date | Keyword | Score
2012-01-10 | ipad | 0.12
2012-01-11 | ipad | 0.17
2012-01-12 | ipad | 0.24
2012-01-10 | taco | 0.19
2012-01-11 | taco | 0.34
2012-01-12 | taco | 0.45
I'd like the final output to look like this by creating a new table using SQL:
Date | Keyword | Score | Slope
2012-01-10 | ipad | 0.12 | 0.06
2012-01-11 | ipad | 0.17 | 0.06
2012-01-12 | ipad | 0.24 | 0.06
2012-01-10 | taco | 0.19 | 0.13
2012-01-11 | taco | 0.34 | 0.13
2012-01-12 | taco | 0.45 | 0.13
To complicate things, not all Keywords have 3 dates worth of data, some have only 2 for instance.
The simpler the SQL the better since my database is proprietary and I'm not quite sure what formulas are available, although I know it can do OVER(PARTITION BY) if that helps. Thank you!
UPDATE: I define the slope as best fit y=mx+p aka in excel it would be =slope()
Here is another actual example that I usually manipulate in excel:
date keyword score slope
1/22/2012 water bottle 0.010885442 0.000334784
1/23/2012 water bottle 0.011203949 0.000334784
1/24/2012 water bottle 0.008460835 0.000334784
1/25/2012 water bottle 0.010363991 0.000334784
1/26/2012 water bottle 0.011800716 0.000334784
1/27/2012 water bottle 0.012948411 0.000334784
1/28/2012 water bottle 0.012732459 0.000334784
1/29/2012 water bottle 0.011682568 0.000334784
Score, (N * Sum_XY - Sum_X * Sum_Y)/(N * Sum_X2 - Sum_X * Sum_X) AS Slope FROM Scores INNER JOIN ( SELECT Keyword, COUNT(*) AS N, SUM(CAST(Date as float)) AS Sum_X, SUM(CAST(Date as float) * CAST(Date as float)) AS Sum_X2, SUM(Score) AS Sum_Y, SUM(CAST(Date as float) * Score) AS Sum_XY FROM Scores GROUP BY Keyword ) G ...
SQL Server INTERCEPT functionINTERCEPT_q. Use INTERCEPT_q to calculate the point at which a line will intersect the y-axis by using existing x-values and y-values. The y-intercept is the value of the point at which intersects the line at x = 0.
You can use the string expression argument in an SQL aggregate function to perform a calculation on values in a field. For example, you could calculate a percentage (such as a surcharge or sales tax) by multiplying a field value by a fraction.
Simple Linear Regression is handy for the SQL Programmer in making a prediction of a linear trend and giving a figure for the level probability for the prediction, and what is more, they are easy to do with the aggregation that is built into SQL.
The cleanest one I could make:
SELECT
Scores.Date, Scores.Keyword, Scores.Score,
(N * Sum_XY - Sum_X * Sum_Y)/(N * Sum_X2 - Sum_X * Sum_X) AS Slope
FROM Scores
INNER JOIN (
SELECT
Keyword,
COUNT(*) AS N,
SUM(CAST(Date as float)) AS Sum_X,
SUM(CAST(Date as float) * CAST(Date as float)) AS Sum_X2,
SUM(Score) AS Sum_Y,
SUM(CAST(Date as float) * Score) AS Sum_XY
FROM Scores
GROUP BY Keyword
) G ON G.Keyword = Scores.Keyword;
It uses Simple Linear Regression to calculate the slope.
Result:
Date Keyword Score Slope
2012-01-22 water bottle 0,010885442 0,000334784345222076
2012-01-23 water bottle 0,011203949 0,000334784345222076
2012-01-24 water bottle 0,008460835 0,000334784345222076
2012-01-25 water bottle 0,010363991 0,000334784345222076
2012-01-26 water bottle 0,011800716 0,000334784345222076
2012-01-27 water bottle 0,012948411 0,000334784345222076
2012-01-28 water bottle 0,012732459 0,000334784345222076
2012-01-29 water bottle 0,011682568 0,000334784345222076
Every database system seems to have a different approach to converting dates to numbers:
TO_SECONDS(date)
or TO_DAYS(date)
TO_NUMBER(TO_CHAR(date, 'J'))
or date - TO_DATE('1','yyyy')
CAST(date AS float)
(or equivalent CONVERT
)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With