I am trying to find the number of times a certain value appears in one column. I have made the dataframe with <code>data = pd.DataFrame.from_csv('data/DataSet2.csv')</code> and now I want to find the number of times something appears in a column. How is this done? I thought it was the below, where I am looking in the education column and counting the number of time <code>?</code> occurs. The code below shows that I am trying to find the number of times <code>9th</code> appears and the error is what I am getting when I run the code Code <pre class="prettyprint"><code>missing2 = df.education.value_counts()['9th'] print(missing2) </code></pre> Error <pre class="prettyprint"><code>KeyError: '9th' </code></pre>

You can create <code>subset</code> of data with your condition and then use <code>shape</code> or <code>len</code>: <pre class="prettyprint"><code>print df col1 education 0 a 9th 1 b 9th 2 c 8th print df.education == '9th' 0 True 1 True 2 False Name: education, dtype: bool print df[df.education == '9th'] col1 education 0 a 9th 1 b 9th print df[df.education == '9th'].shape[0] 2 print len(df[df['education'] == '9th']) 2 </code></pre> Performance is interesting, the fastest solution is compare numpy array and <code>sum</code>: <img src="https://i.stack.imgur.com/PRDOD.png" alt="graph"> Code: <pre class="prettyprint"><code>import perfplot, string np.random.seed(123) def shape(df): return df[df.education == 'a'].shape[0] def len_df(df): return len(df[df['education'] == 'a']) def query_count(df): return df.query('education == "a"').education.count() def sum_mask(df): return (df.education == 'a').sum() def sum_mask_numpy(df): return (df.education.values == 'a').sum() def make_df(n): L = list(string.ascii_letters) df = pd.DataFrame(np.random.choice(L, size=n), columns=['education']) return df perfplot.show( setup=make_df, kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy], n_range=[2**k for k in range(2, 25)], logx=True, logy=True, equality_check=False, xlabel='len(df)') </code></pre>

Python Pandas Counting the Occurrences of a Specific value

Tags:

python

pandas

I am trying to find the number of times a certain value appears in one column.

I have made the dataframe with data = pd.DataFrame.from_csv('data/DataSet2.csv')

and now I want to find the number of times something appears in a column. How is this done?

I thought it was the below, where I am looking in the education column and counting the number of time ? occurs.

The code below shows that I am trying to find the number of times 9th appears and the error is what I am getting when I run the code

Code

missing2 = df.education.value_counts()['9th'] print(missing2)

Error

KeyError: '9th'

972

asked Feb 08 '16 18:02

JJSmith

2 Answers

You can create subset of data with your condition and then use shape or len:

print df   col1 education 0    a       9th 1    b       9th 2    c       8th  print df.education == '9th' 0     True 1     True 2    False Name: education, dtype: bool  print df[df.education == '9th']   col1 education 0    a       9th 1    b       9th  print df[df.education == '9th'].shape[0] 2 print len(df[df['education'] == '9th']) 2

Performance is interesting, the fastest solution is compare numpy array and sum:

graph

Code:

import perfplot, string np.random.seed(123)   def shape(df):     return df[df.education == 'a'].shape[0]  def len_df(df):     return len(df[df['education'] == 'a'])  def query_count(df):     return df.query('education == "a"').education.count()  def sum_mask(df):     return (df.education == 'a').sum()  def sum_mask_numpy(df):     return (df.education.values == 'a').sum()  def make_df(n):     L = list(string.ascii_letters)     df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])     return df  perfplot.show(     setup=make_df,     kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],     n_range=[2**k for k in range(2, 25)],     logx=True,     logy=True,     equality_check=False,      xlabel='len(df)')

answered Oct 02 '22 02:10

jezrael

Couple of ways using count or sum

In [338]: df Out[338]:   col1 education 0    a       9th 1    b       9th 2    c       8th  In [335]: df.loc[df.education == '9th', 'education'].count() Out[335]: 2  In [336]: (df.education == '9th').sum() Out[336]: 2  In [337]: df.query('education == "9th"').education.count() Out[337]: 2

answered Oct 02 '22 01:10

Zero

Related questions
                            
                                Tensorflow - ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float)
                            
                                gnuplot vs Matplotlib [closed]
                            
                                How to have same text in two links with restructured text?
                            
                                'invalid value encountered in double_scalars' warning, possibly numpy
                            
                                Python: Mocking a context manager
                            
                                how to test if a variable is pd.NaT?
                            
                                Python: Maximum recursion depth exceeded
                            
                                python filter list of dictionaries based on key value
                            
                                What is the max length of a Python string?
                            
                                Sending SOAP request using Python Requests
                            
                                What is the difference between multiprocessing and subprocess?
                            
                                Is there an object unique identifier in Python
                            
                                Merging dataframes on index with pandas
                            
                                Extract list of attributes from list of objects in python
                            
                                Find all index position in list based on partial string inside item in list
                            
                                Find the indexes of all regex matches?
                            
                                In what case would I use a tuple as a dictionary key?
                            
                                How can I pretty-print ASCII tables with Python? [closed]
                            
                                How to append rows in a pandas dataframe in a for loop?
                            
                                Slicing a dictionary

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With