Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How is it possible to take a slice this way?




I'm taking a course in machine learning and there was a recommendation (about making a balancing of classes) to use the following code:

X_train_to_add = X_train[y_train.as_matrix() == 1, :][indices_to_add, :]

where y_train is a pandas dataframe (which is converted there to the numpy array via as.matrix()). I don't get how it is possible to use matrix as an index for slicing.

like image 277
bastak Avatar asked Nov 09 '22 11:11


1 Answers

It might help to break down the statement into its component parts. This statement is equivalent to the following sequence of statements:

y = y_train.as_matrix()
row_mask = y == 1
X_masked = X_train[row_mask,:]
X_train_to_add = X_masked[indices_to_add, :]

Let's look at a concrete example. Let's suppose y, X_train, and indices_to_add have the following values:

>>> import numpy as np
>>> y = np.array([1, 2, -1, 1, 1])
>>> X_train = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15]])
>>> indices_to_add = np.array([2, 0])

First, we create a boolean array indicating which elements of y are equal to 1, which we'll call the "row mask".

>>> row_mask = y == 1
>>> row_mask
array([ True, False, False,  True,  True], dtype=bool)

Next, we use the row mask to select the rows of X_train such that the corresponding values of row_mask are True (or equivalently, the rows such that the corresponding values of y are equal to 1).

>>> X_masked = X_train[row_mask,:]
>>> X_masked
array([[ 1,  2,  3],
       [10, 11, 12],
       [13, 14, 15]])

Finally, we use an array of indices to select certain rows from the previous result. Note that these indices refer to rows of X_masked, not the original matrix X_train.

>>> X_train_to_add = X_masked[indices_to_add,:]
>>> X_train_to_add
array([[13, 14, 15],
       [ 1,  2,  3]])

You can see some more examples of numpy indexing in the documentation.

like image 179
augurar Avatar answered Nov 14 '22 21:11
