I'm taking a course in machine learning and there was a recommendation (about making a balancing of classes) to use the following code:
X_train_to_add = X_train[y_train.as_matrix() == 1, :][indices_to_add, :]
where y_train
is a pandas dataframe (which is converted there to the numpy array via as.matrix()
). I don't get how it is possible to use matrix as an index for slicing.
It might help to break down the statement into its component parts. This statement is equivalent to the following sequence of statements:
y = y_train.as_matrix()
row_mask = y == 1
X_masked = X_train[row_mask,:]
X_train_to_add = X_masked[indices_to_add, :]
Let's look at a concrete example. Let's suppose y
, X_train
, and indices_to_add
have the following values:
>>> import numpy as np
>>> y = np.array([1, 2, -1, 1, 1])
>>> X_train = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15]])
>>> indices_to_add = np.array([2, 0])
First, we create a boolean array indicating which elements of y
are equal to 1
, which we'll call the "row mask".
>>> row_mask = y == 1
>>> row_mask
array([ True, False, False, True, True], dtype=bool)
Next, we use the row mask to select the rows of X_train
such that the corresponding values of row_mask
are True
(or equivalently, the rows such that the corresponding values of y
are equal to 1
).
>>> X_masked = X_train[row_mask,:]
>>> X_masked
array([[ 1, 2, 3],
[10, 11, 12],
[13, 14, 15]])
Finally, we use an array of indices to select certain rows from the previous result. Note that these indices refer to rows of X_masked
, not the original matrix X_train
.
>>> X_train_to_add = X_masked[indices_to_add,:]
>>> X_train_to_add
array([[13, 14, 15],
[ 1, 2, 3]])
You can see some more examples of numpy indexing in the documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With