Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Scikit-learn: partial_dependence only take 2 features?

I'm using sklean 14.1 and I hope to return the partial_plot values instead using plot_partial_dependence to return a figure, so I thought maybe I can use partial_dependence, but have some troubles here.

It seems partial_dependence only takes two features, and I only want the value for one feature.

When I modified the sample code scikit-learn's website provides:(change target_feature = (1,2) to target_feature = (1)), it complains:

*** ValueError: need more than 1 value to unpack

Here's the code:

from sklearn.cross_validation import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble.partial_dependence import plot_partial_dependence
from sklearn.ensemble.partial_dependence import partial_dependence
from sklearn.datasets.california_housing import fetch_california_housing
cal_housing = fetch_california_housing()

X_train, X_test, y_train, y_test = train_test_split(cal_housing.data,
                                             cal_housing.target,test_size=0.2, 
                                             random_state=1)                                                                                                  
names = cal_housing.feature_names

clf = GradientBoostingRegressor(n_estimators=100, max_depth=4,   
                                learning_rate=0.1, loss='huber',random_state=1)                                 
clf.fit(X_train, y_train)
target_feature = (1)
pdp, (x_axis, y_axis) = partial_dependence(clf, target_feature, X=X_train, grid_resolution=50)

In the source code, it says:

target_variables : array-like, dtype=int
    The target features for which the partial dependecy should be
    computed (size should be smaller than 3 for visual renderings).

Can anyone help me to figure out what I did wrong? Or help me to extract the partial dependence value for ONE feature I need?

Thank you so much!

like image 538
user2921752 Avatar asked Nov 01 '22 04:11

user2921752


1 Answers

Here's Peter Prettenhofer's answer to my email. I'm posting here in case someone else needs it too.

here is the issue:

the results on the left hand side assumes that the result is a two-way partial dependence plot but its a one-way PDP. This should fix it:

pdp, (x_axis, ) = partial_dependence(clf, target_feature, X=X_train, grid_resolution=50)

It works perfectly & thank you very much!

like image 50
user2921752 Avatar answered Nov 06 '22 01:11

user2921752