Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

segmented linear regression in python

Is there a library in python to do segmented linear regression? I'd like to fit multiple lines to my data automatically to get something like this: segmented regression

Btw. I do know the number of segments.

like image 520
P3trus Avatar asked Jan 25 '12 07:01

P3trus


2 Answers

Probably, Numpy's numpy.piecewise() tool can be used.
More detailed description is shown here: How to apply piecewise linear fit in Python?

If this is not what is needed, then you may probably find some helpful information in these questions: https://datascience.stackexchange.com/questions/8266/is-there-a-library-that-would-perform-segmented-linear-regression-in-python

and here:
https://datascience.stackexchange.com/questions/8457/python-library-for-segmented-regression-a-k-a-piecewise-regression

like image 86
Erba Aitbayev Avatar answered Sep 22 '22 14:09

Erba Aitbayev


enter image description here

As mentioned in a comment above, segmented linear regression brings the problem of many free parameters. I therefore decided to go away from an approach, which uses n_segments * 3 - 1 parameters (i.e. n_segments - 1 segment positions, n_segment y-offests, n_segment slopes) and performs numerical optimization. Instead, I look for regions, which have already a roughly constant slope.

Algorithm

  • Calculate the slope of all points
  • Cluster points with a similar slope to segments (done by the DecisionTree)
  • Perform LinearRegression on the segments, found in the previous step

A decision tree is used instead of a clustering algorithm to get connected segments and not set of (non neighboring) points. The details of the segmentation can be adjusted by the decision trees parameters (currently max_leaf_nodes).

Code

import numpy as np
import matplotlib.pylab as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression

# parameters for setup
n_data = 20

# segmented linear regression parameters
n_seg = 3

np.random.seed(0)
fig, (ax0, ax1) = plt.subplots(1, 2)

# example 1
#xs = np.sort(np.random.rand(n_data))
#ys = np.random.rand(n_data) * .3 + np.tanh(5* (xs -.5))

# example 2
xs = np.linspace(-1, 1, 20)
ys = np.random.rand(n_data) * .3 + np.tanh(3*xs)

dys = np.gradient(ys, xs)

rgr = DecisionTreeRegressor(max_leaf_nodes=n_seg)
rgr.fit(xs.reshape(-1, 1), dys.reshape(-1, 1))
dys_dt = rgr.predict(xs.reshape(-1, 1)).flatten()

ys_sl = np.ones(len(xs)) * np.nan
for y in np.unique(dys_dt):
    msk = dys_dt == y
    lin_reg = LinearRegression()
    lin_reg.fit(xs[msk].reshape(-1, 1), ys[msk].reshape(-1, 1))
    ys_sl[msk] = lin_reg.predict(xs[msk].reshape(-1, 1)).flatten()
    ax0.plot([xs[msk][0], xs[msk][-1]],
             [ys_sl[msk][0], ys_sl[msk][-1]],
             color='r', zorder=1)

ax0.set_title('values')
ax0.scatter(xs, ys, label='data')
ax0.scatter(xs, ys_sl, s=3**2, label='seg lin reg', color='g', zorder=5)
ax0.legend()

ax1.set_title('slope')
ax1.scatter(xs, dys, label='data')
ax1.scatter(xs, dys_dt, label='DecisionTree', s=2**2)
ax1.legend()

plt.show()
like image 23
Markus Dutschke Avatar answered Sep 21 '22 14:09

Markus Dutschke