In my dataset I've close to 200 rows but for a minimal working e.g., let's assume the following array:
arr = np.array([[1,2,3,4], [5,6,7,8],
[9,10,11,12], [13,14,15,16],
[17,18,19,20], [21,22,23,24]])
I can take a random sampling of 3 of the rows as follows:
indexes = np.random.choice(np.arange(arr.shape[0]), int(arr.shape[0]/2), replace=False)
Using these indexes, I can select my test cases as follows:
testing = arr[indexes]
I want to delete the rows at these indexes and I can use the remaining elements for my training set.
From the post here, it seems that training = np.delete(arr, indexes)
ought to do it. But I get 1d array instead.
I also tried the suggestion here using training = arr[indexes.astype(np.bool)]
but it did not give a clean separation. I get element [5,6,7,8] in both the training and testing sets.
training = arr[indexes.astype(np.bool)]
testing
Out[101]:
array([[13, 14, 15, 16],
[ 5, 6, 7, 8],
[17, 18, 19, 20]])
training
Out[102]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
Any idea what I am doing wrong? Thanks.
To delete indexed rows from numpy array:
arr = np.delete(arr, indexes, axis=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With