What is the best practice to remove all rows that has a column with low frequency value?
Dataframe:
IN:
foo bar poo
1   a   A
2   a   A
3   a   B
4   b   B
5   b   A
6   b   A
7   c   C
8   d   B
9   e   B
Example 1: Remove all rows that have less than 3 in frequency value in column 'poo':
OUT:
foo bar poo
1   a   A
2   a   A
3   a   B
4   b   B
5   b   A
6   b   A
8   d   B
9   e   B
Example 2: Remove all rows that have less than 3 in frequency value in column 'bar':
OUT:
foo bar poo
1   a   A
2   a   A
3   a   B
4   b   B
5   b   A
6   b   A
                Note that the argument axis must be set to 0 for deleting rows (In Pandas drop (), the axis defaults to 0, so it can be omitted). If axis=1 is specified, it will delete columns instead. Alternatively, a more intuitive way to delete a row from DataFrame is to use the index argument. 2. Delete multiple rows
Now , we have to drop rows based on the conditions. Just specify the column name with a condition. dataframe.drop (dataframe [dataframe ['column'] operator value].index) Example 1: In this example, we are going to drop the rows based on cost column Example 2: In this example, we are going to drop the rows based on quantity column
It is similar to table that stores the data in rows and columns. Rows represents the records/ tuples and columns refers to the attributes. We can create the DataFrame by using pandas.DataFrame () method. We can also create a DataFrame using dictionary by skipping columns and indices. Let’s see an example.
As you can see based on Table 1, our example data is a DataFrame and comprises six rows and three variables called “x1”, “x2”, and “x3”. This example shows how to delete certain rows of a pandas DataFrame based on a column of this DataFrame.
This should generalise pretty easily. You'll need groupby + transform + count, and then filter the result:
col = 'poo'  # 'bar'
n = 3        # 2
df[df.groupby(col)[col].transform('count').ge(n)]
   foo bar poo
0    1   a   A
1    2   a   A
2    3   a   B
3    4   b   B
4    5   b   A
5    6   b   A
7    8   d   B
8    9   e   B
                        IIUC filter ..
df.groupby('poo').filter(lambda x : (x['poo'].count()>=3).any())
Out[81]: 
   foo bar poo
0    1   a   A
1    2   a   A
2    3   a   B
3    4   b   B
4    5   b   A
5    6   b   A
7    8   d   B
8    9   e   B
Or combine value_counts with isin
s=df.poo.value_counts().gt(3)
df.loc[df.poo.isin(s[s].index)]
Out[89]: 
   foo bar poo
0    1   a   A
1    2   a   A
2    3   a   B
3    4   b   B
4    5   b   A
5    6   b   A
7    8   d   B
8    9   e   B
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With