I have a data frame in pandas and would like to get all the values of a certain column that appear more than X times. I know this should be easy but somehow I am not getting anywhere with my current attempts. Here is an example: <pre class="prettyprint"><code>>>> df2 = pd.DataFrame([{"uid": 0, "mi":1}, {"uid": 0, "mi":2}, {"uid": 0, "mi":1}, {"uid": 0, "mi":1}]) >>> df2 mi uid 0 1 0 1 2 0 2 1 0 3 1 0 </code></pre> Now supposed I want to get all values from column "mi" that appear more than 2 times, the result should be <pre class="prettyprint"><code>>>> <fancy query> array([1]) </code></pre> I have tried a couple of things with groupby and count but I always end up with a series with the values and their respective counts but don't know how to extract the values that have count more than X from that: <pre class="prettyprint"><code>>>> df2.groupby('mi').mi.count() > 2 mi 1 True 2 False dtype: bool </code></pre> But how can I use this now to get the values of mi that are true? Any hints appreciated :)

Or how about this: Create the table: <pre class="prettyprint"><code>>>> import pandas as pd >>> df2 = pd.DataFrame([{"uid": 0, "mi":1}, {"uid": 0, "mi":2}, {"uid": 0, "mi":1}, {"uid": 0, "mi":1}]) </code></pre> Get the counts of each occurance: <pre class="prettyprint"><code>>>> vc = df2.mi.value_counts() >>> print vc 1 3 2 1 </code></pre> Print out those that occur more than 2 times: <pre class="prettyprint"><code>>>> print vc[vc > 2].index[0] 1 </code></pre>

I use this: <pre class="prettyprint"><code> df2.mi.value_counts().reset_index(name="count").query("count > 5")["index"] </code></pre> The part before <code>query()</code> gives me a data frame with two columns: <code>index</code> and <code>count</code>. The <code>query()</code> filters on <code>count</code> and then we pull out the values.

Pandas: Get values from column that appear more than X times

Tags:

python

pandas

I have a data frame in pandas and would like to get all the values of a certain column that appear more than X times. I know this should be easy but somehow I am not getting anywhere with my current attempts.

Here is an example:

>>> df2 = pd.DataFrame([{"uid": 0, "mi":1}, {"uid": 0, "mi":2}, {"uid": 0, "mi":1}, {"uid": 0, "mi":1}]) >>> df2      mi  uid 0    1   0 1    2   0 2    1   0 3    1   0

Now supposed I want to get all values from column "mi" that appear more than 2 times, the result should be

>>> <fancy query> array([1])

I have tried a couple of things with groupby and count but I always end up with a series with the values and their respective counts but don't know how to extract the values that have count more than X from that:

>>> df2.groupby('mi').mi.count() > 2 mi 1      True 2     False dtype: bool

But how can I use this now to get the values of mi that are true?

Any hints appreciated :)

789

asked Mar 11 '14 08:03

Robin

2 Answers

Or how about this:

Create the table:

>>> import pandas as pd >>> df2 = pd.DataFrame([{"uid": 0, "mi":1}, {"uid": 0, "mi":2}, {"uid": 0, "mi":1}, {"uid": 0, "mi":1}])

Get the counts of each occurance:

>>> vc = df2.mi.value_counts() >>> print vc 1    3 2    1

Print out those that occur more than 2 times:

>>> print vc[vc > 2].index[0] 1

102

answered Oct 14 '22 22:10

juniper-

I use this:

 df2.mi.value_counts().reset_index(name="count").query("count > 5")["index"]

The part before query() gives me a data frame with two columns: index and count. The query() filters on count and then we pull out the values.

answered Oct 15 '22 00:10

nicolaskruchten

Related questions
                            
                                Python: waiting for external launched process finish
                            
                                deleting entries in a dictionary based on a condition
                            
                                tkinter app adding a right click context menu?
                            
                                Python: logging.streamhandler is not sending logs to stdout
                            
                                ImportError: No module named django.core.management when using manage.py
                            
                                Creating numpy linspace out of datetime
                            
                                How can I create a slice object for Numpy array?
                            
                                How to delete rows in python pandas DataFrame using regular expressions?
                            
                                Beautifulsoup multiple class selector
                            
                                how to convert monthly data to quarterly in pandas
                            
                                bcrypt.checkpw returns TypeError: Unicode-objects must be encoded before checking
                            
                                How can I process command line arguments in Python?
                            
                                Does Python have a module for parsing HTTP requests and responses?
                            
                                Fixing color in scatter plots in matplotlib
                            
                                How to call a static methods on a django model class during a south migration
                            
                                How to avoid infinite recursion with super()?
                            
                                Sphinx and argparse - autodocumenting command line scripts?
                            
                                Avoiding infinite loops in __getattribute__ [duplicate]
                            
                                Using buttons in Tkinter to navigate to different pages of the application?
                            
                                Python regex, remove all punctuation except hyphen for unicode string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With