Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`np.concatenate` a numpy array with a sparse matrix

A dataset contains numerical and categorial variables, and I split then into two parts:

cont_data = data[cont_variables].values
disc_data = data[disc_variables].values

Then I use sklearn.preprocessing.OneHotEncoder to encode the categorical data, and then I tried to merge the coded categorical data with the numerical data:

np.concatenate((cont_data, disc_data_coded), axis=1)

But the following error occurs:

ValueError: all the input arrays must have same number of dimensions

I ensured that the number of dimensions are equal:

print(cont_data.shape)        # (24000, 35)
print(disc_data_coded.shape)  # (24000, 26)

Finally, I found that cont_data is a numpy array while

>>> disc_data_coded
<24000x26 sparse matrix of type '<class 'numpy.float64'>'
with 312000 stored elements in Compressed Sparse Row format>

I changed the parameter sparse in OneHotEncoderto be False, everything is OK. But the question is, how can I merge a numpy array with a sparse matrix directly, without setting sparse=False?

like image 420
htredleaf Avatar asked Mar 22 '18 03:03

htredleaf


1 Answers

Sparse matrices are not subclasses of numpy arrays; so numpy methods often don't work. Use sparse functions instead, such as sparse.vstack and sparse.hstack. But all inputs then have to be sparse.

Or make the sparse matrix dense first, with .toarray(), and use np.concatenate.

Do you want the result to sparse or dense?

In [32]: sparse.vstack((sparse.csr_matrix(np.arange(10)),sparse.csr_matrix(np.on
    ...: es((3,10)))))
Out[32]: 
<4x10 sparse matrix of type '<class 'numpy.float64'>'
    with 39 stored elements in Compressed Sparse Row format>
In [33]: np.concatenate((sparse.csr_matrix(np.arange(10)).A,np.ones((3,10))))
Out[33]: 
array([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
like image 147
hpaulj Avatar answered Nov 05 '22 13:11

hpaulj