Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a dataframe according to a boolean criterion?

Tags:

pandas

Suppose that df is a pandas dataframe. I want to split it into two dataframes according to some criterion. The best way I've found for doing this is something like

df0, df1 = [v for _, v in df.groupby(df['class'] != 'special')]

In the above example, the criterion is the argument to the groupby method. The resulting df0 consists of the sub-dataframe where the class field has value 'special', and df1 is basically the complement of df0. (Unfortunately, with this construct, the sub-dataframe consisting of the items that fail the criterion are returned first, which is not intuitive.)

The above construct has the drawback that it is not particularly readable, certainly not as readable as, for instance, some hypothetical splitby method like

df0, df1 = df.splitby(df['class'] == 'special')

Since splitting a dataframe like this is something I often need to do, I figure that there may be a built-in function, or maybe an established idiom, for doing this. If so, please let me know.

like image 875
kjo Avatar asked Feb 19 '13 12:02

kjo


1 Answers

I think the most readable way is to do this is:

m = df['class'] != 'special'
a, b = df[m], df[~m]

I haven't come across a special method for this...

like image 59
Andy Hayden Avatar answered Oct 20 '22 13:10

Andy Hayden