Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: count things

Tags:

python

pandas

In the following, male_trips is a big pandas data frame and stations is a small pandas data frame. For each station id I'd like to know how many male trips took place. The following does the job, but takes a long time:

mc = [ sum( male_trips['start_station_id'] == id ) for id in stations['id'] ] 

how should I go about this instead?


Update! So there were two main approaches: groupby() followed by size(), and the simpler .value_counts(). I did a quick timeit, and the groupby approach wins by quite a large margin! Here is the code:

from timeit import Timer setup = "import pandas; male_trips=pandas.load('maletrips')" a  = "male_trips.start_station_id.value_counts()" b = "male_trips.groupby('start_station_id').size()" Timer(a,setup).timeit(100) Timer(b,setup).timeit(100) 

and here is the result:

In [4]: Timer(a,setup).timeit(100) # <- this is value_counts Out[4]: 9.709594964981079  In [5]: Timer(b,setup).timeit(100) # <- this is groupby / size Out[5]: 1.5574288368225098 

Note that, at this speed, for exploring data typing value_counts is marginally quicker and less remembering!

like image 398
Mike Dewar Avatar asked Oct 12 '12 21:10

Mike Dewar


People also ask

How do you count items in a data frame?

len() returns the number of items(the length) of a list object(also works for dictionary, string, tuple or range objects). So, for getting row counts of a DataFrame, simply use len(df) .

How do I count the number of rows in Pandas?

You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.


1 Answers

I'd do like Vishal but instead of using sum() using size() to get a count of the number of rows allocated to each group of 'start_station_id'. So:

df = male_trips.groupby('start_station_id').size() 
like image 77
Dani Arribas-Bel Avatar answered Oct 01 '22 12:10

Dani Arribas-Bel