`np.concatenate` a numpy array with a sparse matrix

Question

A dataset contains numerical and categorial variables, and I split then into two parts:

cont_data = data[cont_variables].values
disc_data = data[disc_variables].values

Then I use sklearn.preprocessing.OneHotEncoder to encode the categorical data, and then I tried to merge the coded categorical data with the numerical data:

np.concatenate((cont_data, disc_data_coded), axis=1)

But the following error occurs:

ValueError: all the input arrays must have same number of dimensions

I ensured that the number of dimensions are equal:

print(cont_data.shape)        # (24000, 35)
print(disc_data_coded.shape)  # (24000, 26)

Finally, I found that cont_data is a numpy array while

>>> disc_data_coded
<24000x26 sparse matrix of type '<class 'numpy.float64'>'
with 312000 stored elements in Compressed Sparse Row format>

I changed the parameter sparse in OneHotEncoderto be False, everything is OK. But the question is, how can I merge a numpy array with a sparse matrix directly, without setting sparse=False?

hpaulj · Accepted Answer

Sparse matrices are not subclasses of numpy arrays; so numpy methods often don't work. Use sparse functions instead, such as sparse.vstack and sparse.hstack. But all inputs then have to be sparse.

Or make the sparse matrix dense first, with .toarray(), and use np.concatenate.

Do you want the result to sparse or dense?

In [32]: sparse.vstack((sparse.csr_matrix(np.arange(10)),sparse.csr_matrix(np.on
    ...: es((3,10)))))
Out[32]: 
<4x10 sparse matrix of type '<class 'numpy.float64'>'
    with 39 stored elements in Compressed Sparse Row format>
In [33]: np.concatenate((sparse.csr_matrix(np.arange(10)).A,np.ones((3,10))))
Out[33]: 
array([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

`np.concatenate` a numpy array with a sparse matrix

Tags:

python

numpy

scikit-learn

htredleaf

1 Answers

hpaulj

Recent Activity

Donate For Us

`np.concatenate` a numpy array with a sparse matrix

Tags:

python

numpy

scikit-learn

htredleaf

1 Answers

hpaulj

Related questions

Recent Activity

Donate For Us