Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regression for each customer's data

I have data containing customers buying history with the respective sales value for every order. I'd like to have some sort of trend of the expenses of each customer over time. I thought about a regression for each customer and extract the coefficient afterwards. Is this possible to do with pandas in an efficient way (I got ~ 1000000 transactions in the data)? If yes, how can I do that?

For better understanding here is the structure of the data.

        Date        Customer_ID     Sales_Value     
     2014-07-01         1            62.946002  
     2014-12-01         2            62.947733  
     2013-05-01         3            27.328221  
     2015-01-01         1            30.023658

This would be the structure of the transaction data with several other columns that are not needed in this case. The Data is unfortunately on month basis, so for the Date you would just have this format: 20xx-xx-01

What I would like to have now is an array that gives me for me every customer the coefficient of the regression based on the Sales_Value during the whole time interval that I have for the transaction data. So basically something like this:

Customer_ID  trend_coeff
  1             -0,5
  2               0
  3               0

(The numbers for the trend_coeff are of course made up just for demonstration)

Thank you for your help!

like image 586
TheDude Avatar asked Jun 29 '26 12:06

TheDude


1 Answers

Say you start off with something like this:

import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    'b': range(9),
    'c': range(1, 10)})
>>> df
    a   b   c
0   1   0   1
1   2   1   2
2   3   2   3
3   1   3   4
4   2   4   5
5   3   5   6
6   1   6   7
7   2   7   8
8   3   8   9

To perform linear regression between 'b' and 'c' for each value of 'a', you can do this:

from sklearn import linear_model

def find_for_a(g):
    p = linear_model.LinearRegression().fit(g.b.values[:, None], g.c.values)
    return pd.Series({'coef': p.coef_[0], 'intercept': p.intercept_})

>>> df.groupby('a').apply(find_for_a)
    coef    intercept
a       
1   1.0     1.0
2   1.0     1.0
3   1.0     1.0
like image 88
Ami Tavory Avatar answered Jul 01 '26 02:07

Ami Tavory



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!