Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting rows from numpy array not working

I am trying to split my numpy array of data points into test and training sets. To do that, I'm randomly selecting rows from the array to use as the training set and the remaining are the test set.

This is my code:

matrix = numpy.loadtxt("matrix_vals.data", delimiter=',', dtype=float)
matrix_rows, matrix_cols = matrix.shape

# training set 
randvals = numpy.random.randint(matrix_rows, size=50)
train = matrix[randvals,:]
test = numpy.delete(matrix, randvals, 0)

print matrix.shape
print train.shape
print test.shape

But the output I get is:

matrix.shape: (130, 14)
train.shape: (50, 14)
test.shape: (89, 14)

This is obviously wrong since the number of rows from train and test should add up to the total number of rows in the matrix but here it's clearly more. Can anyone help me figure out what's going wrong?

like image 291
SanjanaS801 Avatar asked Jan 28 '26 02:01

SanjanaS801


1 Answers

Because you are generating random integers with replacement, randvals will almost certainly contain repeat indices.

Indexing with repeated indices will return the same row multiple times, so matrix[randvals, :] is guaranteed to give you an output with exactly 50 rows, regardless of whether some of them are repeated.

In contrast, np.delete(matrix, randvals, 0) will only remove unique row indices, so it will reduce the number of rows only by the number of unique values in randvals.

Try comparing:

print(np.unique(randvals).shape[0] == matrix_rows - test.shape[0])
# True

To generate a vector of unique random indices between 0 and 1 - matrix_rows, you could use np.random.choice with replace=False:

uidx = np.random.choice(matrix_rows, size=50, replace=False)

Then matrix[uidx].shape[0] + np.delete(matrix, uidx, 0).shape[0] == matrix_rows.

like image 177
ali_m Avatar answered Jan 30 '26 18:01

ali_m



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!