Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Get values from column that appear more than X times

Tags:

python

pandas

I have a data frame in pandas and would like to get all the values of a certain column that appear more than X times. I know this should be easy but somehow I am not getting anywhere with my current attempts.

Here is an example:

>>> df2 = pd.DataFrame([{"uid": 0, "mi":1}, {"uid": 0, "mi":2}, {"uid": 0, "mi":1}, {"uid": 0, "mi":1}]) >>> df2      mi  uid 0    1   0 1    2   0 2    1   0 3    1   0 

Now supposed I want to get all values from column "mi" that appear more than 2 times, the result should be

>>> <fancy query> array([1]) 

I have tried a couple of things with groupby and count but I always end up with a series with the values and their respective counts but don't know how to extract the values that have count more than X from that:

>>> df2.groupby('mi').mi.count() > 2 mi 1      True 2     False dtype: bool 

But how can I use this now to get the values of mi that are true?

Any hints appreciated :)

like image 789
Robin Avatar asked Mar 11 '14 08:03

Robin


People also ask

How do you find the most frequent value in a column in Python?

To sum the number of times an element or number appears, Python's value_counts() function is used. The mode() method can then be used to get the most often occurring element.

How do you count occurrences of specific value in pandas column?

How do you Count the Number of Occurrences in a data frame? To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.

What does Value_counts () do in pandas?

Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.


2 Answers

Or how about this:

Create the table:

>>> import pandas as pd >>> df2 = pd.DataFrame([{"uid": 0, "mi":1}, {"uid": 0, "mi":2}, {"uid": 0, "mi":1}, {"uid": 0, "mi":1}]) 

Get the counts of each occurance:

>>> vc = df2.mi.value_counts() >>> print vc 1    3 2    1 

Print out those that occur more than 2 times:

>>> print vc[vc > 2].index[0] 1 
like image 102
juniper- Avatar answered Oct 14 '22 22:10

juniper-


I use this:

 df2.mi.value_counts().reset_index(name="count").query("count > 5")["index"] 

The part before query() gives me a data frame with two columns: index and count. The query() filters on count and then we pull out the values.

like image 26
nicolaskruchten Avatar answered Oct 15 '22 00:10

nicolaskruchten