What is the best practice to remove all rows that has a column with low frequency value? Dataframe: <pre class="prettyprint"><code>IN: foo bar poo 1 a A 2 a A 3 a B 4 b B 5 b A 6 b A 7 c C 8 d B 9 e B </code></pre> Example 1: Remove all rows that have less than 3 in frequency value in column 'poo': <pre class="prettyprint"><code>OUT: foo bar poo 1 a A 2 a A 3 a B 4 b B 5 b A 6 b A 8 d B 9 e B </code></pre> Example 2: Remove all rows that have less than 3 in frequency value in column 'bar': <pre class="prettyprint"><code>OUT: foo bar poo 1 a A 2 a A 3 a B 4 b B 5 b A 6 b A </code></pre>

This should generalise pretty easily. You'll need <code>groupby</code> + <code>transform</code> + <code>count</code>, and then filter the result: <pre class="prettyprint"><code>col = 'poo' # 'bar' n = 3 # 2 df[df.groupby(col)[col].transform('count').ge(n)] foo bar poo 0 1 a A 1 2 a A 2 3 a B 3 4 b B 4 5 b A 5 6 b A 7 8 d B 8 9 e B </code></pre>

IIUC filter .. <pre class="prettyprint"><code>df.groupby('poo').filter(lambda x : (x['poo'].count()>=3).any()) Out[81]: foo bar poo 0 1 a A 1 2 a A 2 3 a B 3 4 b B 4 5 b A 5 6 b A 7 8 d B 8 9 e B </code></pre> Or combine <code>value_counts</code> with <code>isin</code> <pre class="prettyprint"><code>s=df.poo.value_counts().gt(3) df.loc[df.poo.isin(s[s].index)] Out[89]: foo bar poo 0 1 a A 1 2 a A 2 3 a B 3 4 b B 4 5 b A 5 6 b A 7 8 d B 8 9 e B </code></pre>

pandas dataframe delete rows with low frequency

Tags:

python

pandas

dataframe

What is the best practice to remove all rows that has a column with low frequency value?

Dataframe:

IN:
foo bar poo
1   a   A
2   a   A
3   a   B
4   b   B
5   b   A
6   b   A
7   c   C
8   d   B
9   e   B

Example 1: Remove all rows that have less than 3 in frequency value in column 'poo':

OUT:
foo bar poo
1   a   A
2   a   A
3   a   B
4   b   B
5   b   A
6   b   A
8   d   B
9   e   B

Example 2: Remove all rows that have less than 3 in frequency value in column 'bar':

OUT:
foo bar poo
1   a   A
2   a   A
3   a   B
4   b   B
5   b   A
6   b   A

251

asked Mar 06 '18 17:03

AnonX

2 Answers

This should generalise pretty easily. You'll need groupby + transform + count, and then filter the result:

col = 'poo'  # 'bar'
n = 3        # 2

df[df.groupby(col)[col].transform('count').ge(n)]

   foo bar poo
0    1   a   A
1    2   a   A
2    3   a   B
3    4   b   B
4    5   b   A
5    6   b   A
7    8   d   B
8    9   e   B

154

answered Oct 17 '22 05:10

cs95

IIUC filter ..

df.groupby('poo').filter(lambda x : (x['poo'].count()>=3).any())
Out[81]: 
   foo bar poo
0    1   a   A
1    2   a   A
2    3   a   B
3    4   b   B
4    5   b   A
5    6   b   A
7    8   d   B
8    9   e   B

Or combine value_counts with isin

s=df.poo.value_counts().gt(3)
df.loc[df.poo.isin(s[s].index)]
Out[89]: 
   foo bar poo
0    1   a   A
1    2   a   A
2    3   a   B
3    4   b   B
4    5   b   A
5    6   b   A
7    8   d   B
8    9   e   B

answered Oct 17 '22 05:10

BENY

Related questions
                            
                                Why did python choose commas over parenthesis in tuple design?
                            
                                Sql Alchemy can't cast jsonb to boolean
                            
                                meaning of comma operator in python
                            
                                How to delete column in 3d numpy array
                            
                                Defining the alphabet to any letter string to then later use to check if a word has a certain amount of characters
                            
                                Replacing non-null values with column names
                            
                                Getting Tor ControlPort to work
                            
                                Pandas merge df error
                            
                                how to multiply multiple columns by another column pandas
                            
                                AES-128 CBC decryption in Python
                            
                                Different background colour areas on matplotlib plot
                            
                                How to convert csv file to text file using python? [duplicate]
                            
                                Python find CRC32 of string
                            
                                how to filter pandas dataframe by string?
                            
                                Infinite loops using 'for' in Python [duplicate]
                            
                                Shift letters by a certain value in python
                            
                                Why is sys.exit() causing a traceback?
                            
                                Missing 1 Required Keyword-Only Argument
                            
                                Images dimensions error in python
                            
                                ZeroMQ operation throws EXC: [ Operation cannot be accomplished in current state ]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With