Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Counting the Occurrences of a Specific value

Tags:

python

pandas

I am trying to find the number of times a certain value appears in one column.

I have made the dataframe with data = pd.DataFrame.from_csv('data/DataSet2.csv')

and now I want to find the number of times something appears in a column. How is this done?

I thought it was the below, where I am looking in the education column and counting the number of time ? occurs.

The code below shows that I am trying to find the number of times 9th appears and the error is what I am getting when I run the code

Code

missing2 = df.education.value_counts()['9th'] print(missing2) 

Error

KeyError: '9th' 
like image 972
JJSmith Avatar asked Feb 08 '16 18:02

JJSmith


People also ask

How do you count a specific value in pandas?

Use Sum Function to Count Specific Values in a Column in a Dataframe. We can use the sum() function on a specified column to count values equal to a set condition, in this case we use == to get just rows equal to our specific data point. If we wanted to count specific values that match another boolean operation we can.

How do you count a certain value in Python?

The count() is a built-in function in Python. It will return you the count of a given element in a list or a string. In the case of a list, the element to be counted needs to be given to the count() function, and it will return the count of the element. The count() method returns an integer value.

How do you count the number of repeated values in pandas?

You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method. If you want to count the number of False (= the number of non-duplicate rows), you can invert it with negation ~ and then count True with sum() .


2 Answers

You can create subset of data with your condition and then use shape or len:

print df   col1 education 0    a       9th 1    b       9th 2    c       8th  print df.education == '9th' 0     True 1     True 2    False Name: education, dtype: bool  print df[df.education == '9th']   col1 education 0    a       9th 1    b       9th  print df[df.education == '9th'].shape[0] 2 print len(df[df['education'] == '9th']) 2 

Performance is interesting, the fastest solution is compare numpy array and sum:

graph

Code:

import perfplot, string np.random.seed(123)   def shape(df):     return df[df.education == 'a'].shape[0]  def len_df(df):     return len(df[df['education'] == 'a'])  def query_count(df):     return df.query('education == "a"').education.count()  def sum_mask(df):     return (df.education == 'a').sum()  def sum_mask_numpy(df):     return (df.education.values == 'a').sum()  def make_df(n):     L = list(string.ascii_letters)     df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])     return df  perfplot.show(     setup=make_df,     kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],     n_range=[2**k for k in range(2, 25)],     logx=True,     logy=True,     equality_check=False,      xlabel='len(df)') 
like image 96
jezrael Avatar answered Oct 02 '22 02:10

jezrael


Couple of ways using count or sum

In [338]: df Out[338]:   col1 education 0    a       9th 1    b       9th 2    c       8th  In [335]: df.loc[df.education == '9th', 'education'].count() Out[335]: 2  In [336]: (df.education == '9th').sum() Out[336]: 2  In [337]: df.query('education == "9th"').education.count() Out[337]: 2 
like image 24
Zero Avatar answered Oct 02 '22 01:10

Zero