Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas slicing FutureWarning with 0.21.0

I'm trying to select a subset of a subset of a dataframe, selecting only some columns, and filtering on the rows.

df.loc[df.a.isin(['Apple', 'Pear', 'Mango']), ['a', 'b', 'f', 'g']] 

However, I'm getting the error:

Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative. 

What 's the correct way to slice and filter now?

like image 291
QuinRiva Avatar asked Dec 19 '17 22:12

QuinRiva


People also ask

How do you slice a DataFrame for specific columns?

To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.

How do you slice the Pandas index?

Slicing Rows and Columns by Index Position When slicing by index position in Pandas, the start index is included in the output, but the stop index is one step beyond the row you want to select. So the slice return row 0 and row 1, but does not return row 2. The second slice [:] indicates that all columns are required.

Do Pandas columns start at 0?

When you create a DataFrame in Pandas, the DataFrame will automatically have certain properties. Specifically, each row and each column will have an integer “location” in the DataFrame. These integer locations for the rows and columns start at zero.


1 Answers

TL;DR: There is likely a typo or spelling error in the column header names.

This is a change introduced in v0.21.1, and has been explained in the docs at length -

Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning NaN for missing labels. This will now show a FutureWarning. In the future this will raise a KeyError (GH15747). This warning will trigger on a DataFrame or a Series for using .loc[] or [[]] when passing a list-of-labels with at least 1 missing label.

For example,

df       A    B  C 0  7.0  NaN  8 1  3.0  3.0  5 2  8.0  1.0  7 3  NaN  0.0  3 4  8.0  2.0  7 

Try some kind of slicing as you're doing -

df.loc[df.A.gt(6), ['A', 'C']]       A  C 0  7.0  8 2  8.0  7 4  8.0  7 

No problem. Now, try replacing C with a non-existent column label -

df.loc[df.A.gt(6), ['A', 'D']] FutureWarning: Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative.            A   D 0  7.0 NaN 2  8.0 NaN 4  8.0 NaN 

So, in your case, the error is because of the column labels you pass to loc. Take another look at them.

like image 175
cs95 Avatar answered Oct 14 '22 13:10

cs95