How to get scikit learn to find simple non-linear relationship

Question

I have some data in a pandas dataframe (although pandas is not the point of this question). As an experiment I made column ZR as column Z divided by column R. As a first step using scikit learn I wanted to see if I could predict ZR from the other columns (which should be possible as I just made it from R and Z). My steps have been.

columns=['R','T', 'V', 'X', 'Z']
for c in columns:
    results[c] = preprocessing.scale(results[c]) 
results['ZR'] = preprocessing.scale(results['ZR'])
labels = results["ZR"].values
features = results[columns].values
#print labels
#print features
regr = linear_model.LinearRegression()
regr.fit(features, labels)
print(regr.coef_)
print np.mean((regr.predict(features)-labels)**2)

This gives

[ 0.36472515 -0.79579885 -0.16316067  0.67995378  0.59256197]
0.458552051342

The preprocessing seems wrong as it destroys the Z/R relationship I think. What's the right way to preprocess in this situation?
Is there some way to get near 100% accuracy? Linear regression is the wrong tool as the relationship is not-linear.
The five features are highly correlated in my data. Is non-negative least squares implemented in scikit learn ? ( I can see it mentioned in the mailing list but not the docs.) My aim would be to get as many coefficients set to zero as possible.

cfh · Accepted Answer

You should easily be able to get a decent fit using random forest regression, without any preprocessing, since it is a nonlinear method:

model = RandomForestRegressor(n_estimators=10, max_features=2)
model.fit(features, labels)

You can play with the parameters to get better performance.

How to get scikit learn to find simple non-linear relationship

Tags:

pandas

machine-learning

scikit-learn

graffe

1 Answers

cfh

Recent Activity

Donate For Us

How to get scikit learn to find simple non-linear relationship

Tags:

pandas

machine-learning

scikit-learn

graffe

1 Answers

cfh

Related questions

Recent Activity

Donate For Us