Recently I've found interesting article about regression clustering algorithm which can deal both tasks of regression and clustering:
http://ncss.wpengine.netdna-cdn.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Regression_Clustering.pdf
I'm just curios-is there some technics (libraries) to do it via Python? Thanks!
The algorithm of Spath is not implemented in Python, as far as I know.
But you could replicate its results using Gaussian mixture models in scikit-learn:
import numpy as np
from sklearn.mixture import GaussianMixture 
import matplotlib.pyplot as plt
# generate random data
np.random.seed(1)
n = 10
x1 = np.random.uniform(0, 20, size=n)
x2 = np.random.uniform(0, 20, size=n)
y1 = x1 + np.random.normal(size=n)
y2 = 15 - x2 + np.random.normal(size=n)
x = np.concatenate([x1, x2])
y = np.concatenate([y1, y2])
data = np.vstack([x, y]).T
model = GaussianMixture (n_components=2).fit(data)
plt.scatter(x, y, c=model.predict(data))
plt.show()
This code produces the picture, similar to one in the paper:

The GMM is different from Spath algorithm, because the former tries to maximize prediction accuracy of ALL data (X and y), and the latter maximizes only R^2 of y. In my opinion, for most practical problems you would prefer the GMM.
If you still want the Spath algorithm, it could be done with a class like this, implementing a version of EM algorithm:
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.base import RegressorMixin, BaseEstimator, clone
class ClusteredRegressor(RegressorMixin, BaseEstimator):
    def __init__(self, n_components=2, base=Ridge(), random_state=1, max_iter=100, tol=1e-10, verbose=False):
        self.n_components = n_components
        self.base = base
        self.random_state = random_state
        self.max_iter = max_iter
        self.tol = tol
        self.verbose = verbose
    def fit(self, X, y):
        np.random.seed(self.random_state)
        self.estimators_ = [clone(self.base) for i in range(self.n_components)]
        # initialize cluster responsibilities randomly
        self.resp_ = np.random.uniform(size=(X.shape[0], self.n_components))
        self.resp_ /= self.resp_.sum(axis=1, keepdims=True)
        for it in range(self.max_iter):
            old_resp = self.resp_.copy()
            # Estimate sample-weithted regressions
            errors = np.empty(shape=self.resp_.shape)
            for i, est in enumerate(self.estimators_):
                est.fit(X, y, sample_weight=self.resp_[:, i])
                errors[:, i] = y - est.predict(X)
            self.mse_ = np.sum(self.resp_ * errors**2) / X.shape[0]
            if self.verbose:
                print(self.mse_)
            # Recalculate responsibilities
            self.resp_ = np.exp(-errors**2 / self.mse_)
            self.resp_ /= self.resp_.sum(axis=1, keepdims=True)
            # stop if change in responsibilites is small
            delta = np.abs(self.resp_ - old_resp).mean()
            if delta < self.tol:
                break
        self.n_iter_ = it
        return self
    def predict(self, X):
        """ Calculate a matrix of conditional predictions """
        return np.vstack([est.predict(X) for est in self.estimators_]).T
    def predict_proba(self, X, y):
        """ Estimate cluster probabilities of labeled data """
        predictions = self.predict(X)
        errors = np.empty(shape=self.resp_.shape)
        for i, est in enumerate(self.estimators_):
            errors[:, i] = y - est.predict(X)
        resp_ = np.exp(-errors**2 / self.mse_)
        resp_ /= resp_.sum(axis=1, keepdims=True)
        return resp_
This code is similar to Spath algorithm, with the only difference that it uses soft "responsibilities" of each cluster for each observation, instead of hard cluster assignment (this way, it is easier for optimization). You can see that the resulting cluster assignment is similar to GMM:
model = ClusteredRegressor()
model.fit(x[:, np.newaxis], y)
labels = np.argmax(model.resp_, axis=1)
plt.scatter(x, y, c=labels)
plt.show()

Unfortunately, this model cannot be applied to predict test data, because its output depends on data labels (y). However, if you further modify my code, you could predict cluster probability conditional on X. In this case, the model would be useful for prediction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With