Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boolean Series key will be reindexed to match DataFrame index

Tags:

python

pandas

Here is how I encountered the error:

df.loc[a_list][df.a_col.isnull()] 

The type of a_list is Int64Index, it contains a list of row indexes. All of these row indexes belong to df.

The df.a_col.isnull() part is a condition I need for filtering.

If I execute the following commands individually, I do not get any warnings:

df.loc[a_list] df[df.a_col.isnull()] 

But if I put them together df.loc[a_list][df.a_col.isnull()], I get the warning message (but I can see the result):

Boolean Series key will be reindexed to match DataFrame index

What is the meaning of this error message? Does it affect the result that it returned?

like image 240
Cheng Avatar asked Jan 18 '17 03:01

Cheng


1 Answers

Your approach will work despite the warning, but it's best not to rely on implicit, unclear behavior.

Solution 1, make the selection of indices in a_list a boolean mask:

df[df.index.isin(a_list) & df.a_col.isnull()] 

Solution 2, do it in two steps:

df2 = df.loc[a_list] df2[df2.a_col.isnull()] 

Solution 3, if you want a one-liner, use a trick found here:

df.loc[a_list].query('a_col != a_col') 

The warning comes from the fact that the boolean vector df.a_col.isnull() is the length of df, while df.loc[a_list] is of the length of a_list, i.e. shorter. Therefore, some indices in df.a_col.isnull() are not in df.loc[a_list].

What pandas does is reindex the boolean series on the index of the calling dataframe. In effect, it gets from df.a_col.isnull() the values corresponding to the indices in a_list. This works, but the behavior is implicit, and could easily change in the future, so that's what the warning is about.

like image 82
IanS Avatar answered Oct 04 '22 04:10

IanS