Code goes as follows, I am trying to use training data for GBRT regression trees, same data works good for other classifiers but gives above error for GBRT. please help :
dataset = load_files('train')
vectorizer = TfidfVectorizer(encoding='latin1')
X_train = vectorizer.fit_transform((open(f).read() for f in dataset.filenames))
assert sp.issparse(X_train)
print("n_samples: %d, n_features: %d" % X_train.shape)
y_train = dataset.target
def benchmark(clf_class, params, name):
clf = clf_class(**params).fit(X_train, y_train)
I came accross the same problem trying to train a GradientBoostingClassifier
using the data loaded by load_svmlight_files
. Solved by transforming a sparse matrix to a numpy array.
X_train.todense()
Because GBRT in sklearn request X (training data) is array-like
not sparse matrix
: sklearn-gbrt
I hope this could help you!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With