Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array

Code goes as follows, I am trying to use training data for GBRT regression trees, same data works good for other classifiers but gives above error for GBRT. please help :

dataset = load_files('train')
vectorizer = TfidfVectorizer(encoding='latin1')
X_train = vectorizer.fit_transform((open(f).read() for f in dataset.filenames)) 
assert sp.issparse(X_train)     
print("n_samples: %d, n_features: %d" % X_train.shape)
y_train = dataset.target
def benchmark(clf_class, params, name):
    clf = clf_class(**params).fit(X_train, y_train)
like image 331
Dhananjay Ambekar Avatar asked May 28 '15 09:05

Dhananjay Ambekar


2 Answers

I came accross the same problem trying to train a GradientBoostingClassifier using the data loaded by load_svmlight_files. Solved by transforming a sparse matrix to a numpy array.

X_train.todense()
like image 131
Peiqin Avatar answered Sep 22 '22 01:09

Peiqin


Because GBRT in sklearn request X (training data) is array-like not sparse matrix: sklearn-gbrt

I hope this could help you!

like image 26
Chung-Yen Hung Avatar answered Sep 23 '22 01:09

Chung-Yen Hung