I have data containing customers buying history with the respective sales value for every order. I'd like to have some sort of trend of the expenses of each customer over time. I thought about a regression for each customer and extract the coefficient afterwards. Is this possible to do with pandas in an efficient way (I got ~ 1000000 transactions in the data)? If yes, how can I do that?
For better understanding here is the structure of the data.
Date Customer_ID Sales_Value
2014-07-01 1 62.946002
2014-12-01 2 62.947733
2013-05-01 3 27.328221
2015-01-01 1 30.023658
This would be the structure of the transaction data with several other columns that are not needed in this case. The Data is unfortunately on month basis, so for the Date you would just have this format: 20xx-xx-01
What I would like to have now is an array that gives me for me every customer the coefficient of the regression based on the Sales_Value during the whole time interval that I have for the transaction data. So basically something like this:
Customer_ID trend_coeff
1 -0,5
2 0
3 0
(The numbers for the trend_coeff are of course made up just for demonstration)
Thank you for your help!
Say you start off with something like this:
import pandas as pd
df = pd.DataFrame({
'a': [1, 2, 3, 1, 2, 3, 1, 2, 3],
'b': range(9),
'c': range(1, 10)})
>>> df
a b c
0 1 0 1
1 2 1 2
2 3 2 3
3 1 3 4
4 2 4 5
5 3 5 6
6 1 6 7
7 2 7 8
8 3 8 9
To perform linear regression between 'b' and 'c' for each value of 'a', you can do this:
from sklearn import linear_model
def find_for_a(g):
p = linear_model.LinearRegression().fit(g.b.values[:, None], g.c.values)
return pd.Series({'coef': p.coef_[0], 'intercept': p.intercept_})
>>> df.groupby('a').apply(find_for_a)
coef intercept
a
1 1.0 1.0
2 1.0 1.0
3 1.0 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With