I'm trying to select a subset of a subset of a dataframe, selecting only some columns, and filtering on the rows.
df.loc[df.a.isin(['Apple', 'Pear', 'Mango']), ['a', 'b', 'f', 'g']]
However, I'm getting the error:
Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative.
What 's the correct way to slice and filter now?
To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.
Slicing Rows and Columns by Index Position When slicing by index position in Pandas, the start index is included in the output, but the stop index is one step beyond the row you want to select. So the slice return row 0 and row 1, but does not return row 2. The second slice [:] indicates that all columns are required.
When you create a DataFrame in Pandas, the DataFrame will automatically have certain properties. Specifically, each row and each column will have an integer “location” in the DataFrame. These integer locations for the rows and columns start at zero.
This is a change introduced in v0.21.1
, and has been explained in the docs at length -
Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning
NaN
for missing labels. This will now show aFutureWarning
. In the future this will raise aKeyError
(GH15747). This warning will trigger on aDataFrame
or aSeries
for using.loc[]
or[[]]
when passing a list-of-labels with at least 1 missing label.
For example,
df A B C 0 7.0 NaN 8 1 3.0 3.0 5 2 8.0 1.0 7 3 NaN 0.0 3 4 8.0 2.0 7
Try some kind of slicing as you're doing -
df.loc[df.A.gt(6), ['A', 'C']] A C 0 7.0 8 2 8.0 7 4 8.0 7
No problem. Now, try replacing C
with a non-existent column label -
df.loc[df.A.gt(6), ['A', 'D']] FutureWarning: Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative. A D 0 7.0 NaN 2 8.0 NaN 4 8.0 NaN
So, in your case, the error is because of the column labels you pass to loc
. Take another look at them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With