I have groupings of values in the data and within each group, I would like to check if a value within the group is below 8
. If this condition is met, the entire group is removed from the data set.
Please note the value I'm referring to lies in another column to the groupings column.
Example Input:
Groups Count
1 7
1 11
1 9
2 12
2 15
2 21
Output:
Groups Count
2 12
2 15
2 21
In the pandas series constructor, there is a method called drop() which is used to remove specified rows from the pandas series object. It won't update the original series object with deleted rows instead of updating the original series object, it will return another series object with the removed rows.
The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.
Pandas DataFrame add() Method The add() method adds each value in the DataFrame with a specified value. The specified value must be an object that can be added to the values of the DataFrame.
(1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.
Based on what you described in the question, as long as there is at least one value is below 8 within the group, then that group should be dropped. So the equivalent statement is that as long as the minimum value within that group is below 8, that group should be dropped.
By using the filter feature, the actual code can be reduced to only one line, please refer to Filtration, you may use the following code:
dfnew = df.groupby('Groups').filter(lambda x: x['Count'].min()>8 )
dfnew.reset_index(drop=True, inplace=True) # reset index
dfnew = dfnew[['Groups','Count']] # rearrange the column sequence
print(dfnew)
Output:
Groups Count
0 2 12
1 2 15
2 2 21
You can use isin
, loc
and unique
with selecting subset by inverted mask. Last you can reset_index
:
print df
Groups Count
0 1 7
1 1 11
2 1 9
3 2 12
4 2 15
5 2 21
print df.loc[df['Count'] < 8, 'Groups'].unique()
[1]
print ~df['Groups'].isin(df.loc[df['Count'] < 8, 'Groups'].unique())
0 False
1 False
2 False
3 True
4 True
5 True
Name: Groups, dtype: bool
df1 = df[~df['Groups'].isin(df.loc[df['Count'] < 8, 'Groups'].unique())]
print df1.reset_index(drop=True)
Groups Count
0 2 12
1 2 15
2 2 21
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With