Deleting rows from numpy array not working

Question

I am trying to split my numpy array of data points into test and training sets. To do that, I'm randomly selecting rows from the array to use as the training set and the remaining are the test set.

This is my code:

matrix = numpy.loadtxt("matrix_vals.data", delimiter=',', dtype=float)
matrix_rows, matrix_cols = matrix.shape

# training set 
randvals = numpy.random.randint(matrix_rows, size=50)
train = matrix[randvals,:]
test = numpy.delete(matrix, randvals, 0)

print matrix.shape
print train.shape
print test.shape

But the output I get is:

matrix.shape: (130, 14)
train.shape: (50, 14)
test.shape: (89, 14)

This is obviously wrong since the number of rows from train and test should add up to the total number of rows in the matrix but here it's clearly more. Can anyone help me figure out what's going wrong?

ali_m · Accepted Answer

Because you are generating random integers with replacement, randvals will almost certainly contain repeat indices.

Indexing with repeated indices will return the same row multiple times, so matrix[randvals, :] is guaranteed to give you an output with exactly 50 rows, regardless of whether some of them are repeated.

In contrast, np.delete(matrix, randvals, 0) will only remove unique row indices, so it will reduce the number of rows only by the number of unique values in randvals.

Try comparing:

print(np.unique(randvals).shape[0] == matrix_rows - test.shape[0])
# True

To generate a vector of unique random indices between 0 and 1 - matrix_rows, you could use np.random.choice with replace=False:

uidx = np.random.choice(matrix_rows, size=50, replace=False)

Then matrix[uidx].shape[0] + np.delete(matrix, uidx, 0).shape[0] == matrix_rows.

Deleting rows from numpy array not working

Tags:

python

arrays

matrix

numpy

SanjanaS801

1 Answers

ali_m

Recent Activity

Donate For Us

Deleting rows from numpy array not working

Tags:

python

arrays

matrix

numpy

SanjanaS801

1 Answers

ali_m

Related questions

Recent Activity

Donate For Us