I have groupings of values in the data and within each group, I would like to check if a value within the group is below <code>8</code>. If this condition is met, the entire group is removed from the data set. Please note the value I'm referring to lies in another column to the groupings column. Example Input: <pre class="prettyprint"><code>Groups Count 1 7 1 11 1 9 2 12 2 15 2 21 </code></pre> Output: <pre class="prettyprint"><code>Groups Count 2 12 2 15 2 21 </code></pre>

You can use <code>isin</code>, <code>loc</code> and <code>unique</code> with selecting subset by inverted mask. Last you can <code>reset_index</code>: <pre class="prettyprint"><code>print df Groups Count 0 1 7 1 1 11 2 1 9 3 2 12 4 2 15 5 2 21 print df.loc[df['Count'] < 8, 'Groups'].unique() [1] print ~df['Groups'].isin(df.loc[df['Count'] < 8, 'Groups'].unique()) 0 False 1 False 2 False 3 True 4 True 5 True Name: Groups, dtype: bool df1 = df[~df['Groups'].isin(df.loc[df['Count'] < 8, 'Groups'].unique())] print df1.reset_index(drop=True) Groups Count 0 2 12 1 2 15 2 2 21 </code></pre>

Pandas: remove group from the data when a value in the group meets a required condition

Tags:

python

pandas

dataframe

grouping

I have groupings of values in the data and within each group, I would like to check if a value within the group is below 8. If this condition is met, the entire group is removed from the data set.

Please note the value I'm referring to lies in another column to the groupings column.

Example Input:

Groups Count
  1      7
  1      11
  1      9 
  2      12
  2      15
  2      21

Output:

Groups Count
  2      12
  2      15
  2      21

848

asked Jan 09 '16 07:01

nrcjea001

2 Answers

Based on what you described in the question, as long as there is at least one value is below 8 within the group, then that group should be dropped. So the equivalent statement is that as long as the minimum value within that group is below 8, that group should be dropped.

By using the filter feature, the actual code can be reduced to only one line, please refer to Filtration, you may use the following code:

dfnew = df.groupby('Groups').filter(lambda x: x['Count'].min()>8 )
dfnew.reset_index(drop=True, inplace=True) # reset index
dfnew = dfnew[['Groups','Count']] # rearrange the column sequence
print(dfnew)

Output:
   Groups  Count
0       2     12
1       2     15
2       2     21

143

answered Nov 15 '22 16:11

2342G456DI8

You can use isin, loc and unique with selecting subset by inverted mask. Last you can reset_index:

print df

  Groups  Count
0       1      7
1       1     11
2       1      9
3       2     12
4       2     15
5       2     21

print df.loc[df['Count'] < 8, 'Groups'].unique()
[1]

print ~df['Groups'].isin(df.loc[df['Count'] < 8, 'Groups'].unique())

0    False
1    False
2    False
3     True
4     True
5     True
Name: Groups, dtype: bool

df1 = df[~df['Groups'].isin(df.loc[df['Count'] < 8, 'Groups'].unique())]
print df1.reset_index(drop=True)

   Groups  Count
0       2     12
1       2     15
2       2     21

answered Nov 15 '22 16:11

jezrael

Related questions
                            
                                What is the difference between a Ruby Hash and a Python dictionary?
                            
                                Draw arrow outside plot in Matplotlib
                            
                                youtube-dl python library documentation
                            
                                Pandas data frame to dictionary of lists
                            
                                how to push a csv data to mongodb using python
                            
                                Django: Deploying an application on Heroku with sqlite3 as the database
                            
                                Customizing annotation with Seaborn's FacetGrid
                            
                                Django Rest Framework pagination extremely slow count
                            
                                IPython notebook won't read the configuration file
                            
                                PUT request for image upload not working in django rest
                            
                                limit choices with dropdown in flask-admin
                            
                                Using a dictionary in Cython , especially inside nogil
                            
                                Pandas reindex dates in Groupby
                            
                                Is it ok to install both Python 2.7 and 3.5?
                            
                                Difference between writerow() and writerows() methods of Python csv module
                            
                                How to make axes transparent in matplotlib?
                            
                                How to create a Tensorflow Tensorboard Empty Graph
                            
                                How to import pyspark in anaconda
                            
                                Demystifying sharedctypes performance
                            
                                Exception in Boto3 - botocore.exceptions.EndpointConnectionError

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With