Using pandas, I would like to get count of a specific value in a column.I know using df.somecolumn.ravel() will give me all the unique values and their count.But how to get count of some specific value.
In[5]:df
Out[5]:
col
1
1
1
1
2
2
2
1
Desired :
To get count of 1.
In[6]:df.somecalulation(1)
Out[6]: 5
To get count of 2.
In[6]:df.somecalulation(2)
Out[6]: 3
Use Sum Function to Count Specific Values in a Column in a Dataframe. We can use the sum() function on a specified column to count values equal to a set condition, in this case we use == to get just rows equal to our specific data point. If we wanted to count specific values that match another boolean operation we can.
We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.
Use pandas. DataFrame. query() to get a column value based on another column.
If you take the value_counts return, you can query it for multiple values: import pandas as pd a = pd.Series ( [1, 1, 1, 1, 2, 2]) counts = a.value_counts () >>> counts [1], counts [2] (4, 2) You do not need to add quotes ('') to indicate specific_column.
Count Values in Pandas Dataframe Step 1: . Importing libraries. Step 2: . Step 3: . In this step, we just simply use the .count () function to count all the values of different columns. Step 4: . If we want to count all the values with respect to row then we have to pass axis=1 or ‘columns’. Step ...
How to Count Occurrences of Specific Value in Pandas Column? In this article, we will discuss how to count occurrences of a specific column value in the pandas column. We can count by using the value_counts () method.
Note: Running the value_counts method on the DataFrame (rather than on a specific column) will return the number of unique values in all the DataFrame columns. An alternative technique is to use the Groupby.size () method to count occurrences in a specific column.
You can try value_counts
:
df = df['col'].value_counts().reset_index()
df.columns = ['col', 'count']
print df
col count
0 1 5
1 2 3
EDIT:
print (df['col'] == 1).sum()
5
Or:
def somecalulation(x):
return (df['col'] == x).sum()
print somecalulation(1)
5
print somecalulation(2)
3
Or:
ser = df['col'].value_counts()
def somecalulation(s, x):
return s[x]
print somecalulation(ser, 1)
5
print somecalulation(ser, 2)
3
EDIT2:
If you need something really fast, use numpy.in1d
:
import pandas as pd
import numpy as np
a = pd.Series([1, 1, 1, 1, 2, 2])
#for testing len(a) = 6000
a = pd.concat([a]*1000).reset_index(drop=True)
print np.in1d(a,1).sum()
4000
print (a == 1).sum()
4000
print np.sum(a==1)
4000
Timings:
len(a)=6
:
In [131]: %timeit np.in1d(a,1).sum()
The slowest run took 9.17 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 29.9 µs per loop
In [132]: %timeit np.sum(a == 1)
10000 loops, best of 3: 196 µs per loop
In [133]: %timeit (a == 1).sum()
1000 loops, best of 3: 180 µs per loop
len(a)=6000
:
In [135]: %timeit np.in1d(a,1).sum()
The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 48.5 µs per loop
In [136]: %timeit np.sum(a == 1)
The slowest run took 5.23 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 273 µs per loop
In [137]: %timeit (a == 1).sum()
1000 loops, best of 3: 271 µs per loop
If you take the value_counts
return, you can query it for multiple values:
import pandas as pd
a = pd.Series([1, 1, 1, 1, 2, 2])
counts = a.value_counts()
>>> counts[1], counts[2]
(4, 2)
However, to count only a single item, it would be faster to use
import numpy as np
np.sum(a == 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With