Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas, Get count of a single value in a Column of a Dataframe

Tags:

python

pandas

Using pandas, I would like to get count of a specific value in a column.I know using df.somecolumn.ravel() will give me all the unique values and their count.But how to get count of some specific value.

In[5]:df
Out[5]:
        col 
         1
         1
         1
         1
         2
         2
         2
         1

Desired :

  To get count of 1.

  In[6]:df.somecalulation(1)
  Out[6]: 5

  To get count of 2.

  In[6]:df.somecalulation(2)
  Out[6]: 3
like image 274
Randhawa Avatar asked Mar 17 '16 17:03

Randhawa


People also ask

How do I count a specific value in a column in Pandas?

Use Sum Function to Count Specific Values in a Column in a Dataframe. We can use the sum() function on a specified column to count values equal to a set condition, in this case we use == to get just rows equal to our specific data point. If we wanted to count specific values that match another boolean operation we can.

How do you count the number of values in a column in a data frame?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.

How do you get the count of one column value based on another column in Python?

Use pandas. DataFrame. query() to get a column value based on another column.

How to get multiple values from a specific column in pandas?

If you take the value_counts return, you can query it for multiple values: import pandas as pd a = pd.Series ( [1, 1, 1, 1, 2, 2]) counts = a.value_counts () >>> counts [1], counts [2] (4, 2) You do not need to add quotes ('') to indicate specific_column.

How do you count values in a Dataframe in pandas?

Count Values in Pandas Dataframe Step 1: . Importing libraries. Step 2: . Step 3: . In this step, we just simply use the .count () function to count all the values of different columns. Step 4: . If we want to count all the values with respect to row then we have to pass axis=1 or ‘columns’. Step ...

How to count occurrences of specific value in pandas column?

How to Count Occurrences of Specific Value in Pandas Column? In this article, we will discuss how to count occurrences of a specific column value in the pandas column. We can count by using the value_counts () method.

How do I Count the number of unique values in Dataframe?

Note: Running the value_counts method on the DataFrame (rather than on a specific column) will return the number of unique values in all the DataFrame columns. An alternative technique is to use the Groupby.size () method to count occurrences in a specific column.


2 Answers

You can try value_counts:

df = df['col'].value_counts().reset_index()
df.columns = ['col', 'count']
print df
   col  count
0    1      5
1    2      3

EDIT:

print (df['col'] == 1).sum()
5

Or:

def somecalulation(x):
    return (df['col'] == x).sum()

print somecalulation(1)
5
print somecalulation(2)
3

Or:

ser = df['col'].value_counts()

def somecalulation(s, x):
    return s[x]

print somecalulation(ser, 1)
5
print somecalulation(ser, 2)
3

EDIT2:

If you need something really fast, use numpy.in1d:

import pandas as pd
import numpy as np

a = pd.Series([1, 1, 1, 1, 2, 2])

#for testing len(a) = 6000
a = pd.concat([a]*1000).reset_index(drop=True)

print np.in1d(a,1).sum()
4000
print (a == 1).sum()
4000
print np.sum(a==1)
4000

Timings:

len(a)=6:

In [131]: %timeit np.in1d(a,1).sum()
The slowest run took 9.17 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 29.9 µs per loop

In [132]: %timeit np.sum(a == 1)
10000 loops, best of 3: 196 µs per loop

In [133]: %timeit (a == 1).sum()
1000 loops, best of 3: 180 µs per loop

len(a)=6000:

In [135]: %timeit np.in1d(a,1).sum()
The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 48.5 µs per loop

In [136]: %timeit np.sum(a == 1)
The slowest run took 5.23 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 273 µs per loop

In [137]: %timeit (a == 1).sum()
1000 loops, best of 3: 271 µs per loop
like image 139
jezrael Avatar answered Oct 13 '22 01:10

jezrael


If you take the value_counts return, you can query it for multiple values:

import pandas as pd

a = pd.Series([1, 1, 1, 1, 2, 2])
counts = a.value_counts()
>>> counts[1], counts[2]
(4, 2)

However, to count only a single item, it would be faster to use

import numpy as np
np.sum(a == 1)
like image 31
Ami Tavory Avatar answered Oct 13 '22 01:10

Ami Tavory