How can I remove values from a column in <code>pandas.DataFrame</code>, that occurs rarely, i.e. with a low frequency? Example: <pre class="prettyprint"><code>In [4]: df[col_1].value_counts() Out[4]: 0 189096 1 110500 2 77218 3 61372 ... 2065 1 2067 1 1569 1 dtype: int64 </code></pre> So, my question is: how to remove values like <code>2065, 2067, 1569</code> and others? And how can I do this for ALL columns, that contain <code>.value_counts()</code> like this? UPDATE: About 'low' I mean values like <code>2065</code>. This value occurs in <code>col_1</code> 1 (one) times and I want to remove values like this.

I see there are two ways you might want to do this. For the entire DataFrame This method removes the values that occur infrequently in the entire DataFrame. We can do it without loops, using built-in functions to speed things up. <pre class="prettyprint"><code>import pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(0, high=9, size=(100,2)), columns = ['A', 'B']) threshold = 10 # Anything that occurs less than this will be removed. value_counts = df.stack().value_counts() # Entire DataFrame to_remove = value_counts[value_counts <= threshold].index df.replace(to_remove, np.nan, inplace=True) </code></pre> Column-by-column This method removes the entries that occur infrequently in each column. <pre class="prettyprint"><code>import pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(0, high=9, size=(100,2)), columns = ['A', 'B']) threshold = 10 # Anything that occurs less than this will be removed. for col in df.columns: value_counts = df[col].value_counts() # Specific column to_remove = value_counts[value_counts <= threshold].index df[col].replace(to_remove, np.nan, inplace=True) </code></pre>

Remove low frequency values from pandas.dataframe

Tags:

python

pandas

How can I remove values from a column in pandas.DataFrame, that occurs rarely, i.e. with a low frequency? Example:

In [4]: df[col_1].value_counts()  Out[4]: 0       189096         1       110500         2        77218         3        61372               ...         2065         1         2067         1         1569         1         dtype: int64

So, my question is: how to remove values like 2065, 2067, 1569 and others? And how can I do this for ALL columns, that contain .value_counts() like this?

UPDATE: About 'low' I mean values like 2065. This value occurs in col_1 1 (one) times and I want to remove values like this.

928

asked Sep 10 '15 20:09

Gilaztdinov Rustam

1 Answers

I see there are two ways you might want to do this.

For the entire DataFrame

This method removes the values that occur infrequently in the entire DataFrame. We can do it without loops, using built-in functions to speed things up.

import pandas as pd import numpy as np  df = pd.DataFrame(np.random.randint(0, high=9, size=(100,2)),          columns = ['A', 'B'])  threshold = 10 # Anything that occurs less than this will be removed. value_counts = df.stack().value_counts() # Entire DataFrame  to_remove = value_counts[value_counts <= threshold].index df.replace(to_remove, np.nan, inplace=True)

Column-by-column

This method removes the entries that occur infrequently in each column.

import pandas as pd import numpy as np  df = pd.DataFrame(np.random.randint(0, high=9, size=(100,2)),          columns = ['A', 'B'])  threshold = 10 # Anything that occurs less than this will be removed. for col in df.columns:     value_counts = df[col].value_counts() # Specific column      to_remove = value_counts[value_counts <= threshold].index     df[col].replace(to_remove, np.nan, inplace=True)

answered Sep 19 '22 15:09

thecircus

Related questions
                            
                                Converting 2D Numpy array of grayscale values to a PIL image
                            
                                Python 3.5 async/await with real code example
                            
                                Python type checking in VS Code [closed]
                            
                                Python boolean methods naming convention
                            
                                Are docstrings for internal functions (python) necessary? [closed]
                            
                                tkinter: using scrollbars on a canvas
                            
                                What is the difference between Python decorators and the decorator pattern?
                            
                                How to subclass an OrderedDict?
                            
                                Python subprocess Popen.communicate() equivalent to Popen.stdout.read()?
                            
                                Why can't I 'yield from' inside an async function?
                            
                                How can I strip Python logging calls without commenting them out?
                            
                                Why am I getting an error about my class defining __slots__ when trying to pickle an object?
                            
                                In Python, how does a for loop with `range` work?
                            
                                Autocomplete in PyCharm for Python compiled extensions
                            
                                How do you daemonize a Flask application?
                            
                                User defined __mul__ method is not commutative
                            
                                How to setup a group in supervisord?
                            
                                How to document an exception using Sphinx?
                            
                                Predicting how long an scikit-learn classification will take to run
                            
                                Convert list of tuples to structured numpy array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With