Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: remove group from the data when a value in the group meets a required condition

I have groupings of values in the data and within each group, I would like to check if a value within the group is below 8. If this condition is met, the entire group is removed from the data set.

Please note the value I'm referring to lies in another column to the groupings column.

Example Input:

Groups Count
  1      7
  1      11
  1      9 
  2      12
  2      15
  2      21 

Output:

Groups Count
  2      12
  2      15
  2      21 
like image 848
nrcjea001 Avatar asked Jan 09 '16 07:01

nrcjea001


People also ask

How do I delete a Pandas group?

In the pandas series constructor, there is a method called drop() which is used to remove specified rows from the pandas series object. It won't update the original series object with deleted rows instead of updating the original series object, it will return another series object with the removed rows.

What does .values in pandas do?

The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

What does .ADD do in pandas?

Pandas DataFrame add() Method The add() method adds each value in the DataFrame with a specified value. The specified value must be an object that can be added to the values of the DataFrame.

What are the three phases of the pandas GroupBy () function?

(1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.


2 Answers

Based on what you described in the question, as long as there is at least one value is below 8 within the group, then that group should be dropped. So the equivalent statement is that as long as the minimum value within that group is below 8, that group should be dropped.

By using the filter feature, the actual code can be reduced to only one line, please refer to Filtration, you may use the following code:

dfnew = df.groupby('Groups').filter(lambda x: x['Count'].min()>8 )
dfnew.reset_index(drop=True, inplace=True) # reset index
dfnew = dfnew[['Groups','Count']] # rearrange the column sequence
print(dfnew)

Output:
   Groups  Count
0       2     12
1       2     15
2       2     21
like image 143
2342G456DI8 Avatar answered Nov 15 '22 16:11

2342G456DI8


You can use isin, loc and unique with selecting subset by inverted mask. Last you can reset_index:

print df

  Groups  Count
0       1      7
1       1     11
2       1      9
3       2     12
4       2     15
5       2     21

print df.loc[df['Count'] < 8, 'Groups'].unique()
[1]

print ~df['Groups'].isin(df.loc[df['Count'] < 8, 'Groups'].unique())

0    False
1    False
2    False
3     True
4     True
5     True
Name: Groups, dtype: bool

df1 = df[~df['Groups'].isin(df.loc[df['Count'] < 8, 'Groups'].unique())]
print df1.reset_index(drop=True)

   Groups  Count
0       2     12
1       2     15
2       2     21
like image 23
jezrael Avatar answered Nov 15 '22 16:11

jezrael