Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by and find top n value_counts pandas

I have a dataframe of taxi data with two columns that looks like this:

Neighborhood    Borough        Time Midtown         Manhattan      X Melrose         Bronx          Y Grant City      Staten Island  Z Midtown         Manhattan      A Lincoln Square  Manhattan      B 

Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this:

df['Neighborhood'].groupby(df['Borough']).value_counts() 

Which gives me something like this:

borough                           Bronx          High  Bridge          3424                Mott Haven            2515                Concourse Village     1443                Port Morris           1153                Melrose                492                North Riverdale        463                Eastchester            434                Concourse              395                Fordham                252                Wakefield              214                Kingsbridge            212                Mount Hope             200                Parkchester            191 ......  Staten Island  Castleton Corners        4                Dongan Hills             4                Eltingville              4                Graniteville             4                Great Kills              4                Castleton                3                Woodrow                  1 

How do I filter it so that I get only the top 5 from each? I know there are a few questions with a similar title but they weren't helpful to my case.

like image 433
ytk Avatar asked Feb 12 '16 14:02

ytk


People also ask

How do you get top 5 values in Pandas?

Python's Pandas module provide easy ways to do aggregation and calculate metrics. Finding Top 5 maximum value for each group can also be achieved while doing the group by. The function that is helpful for finding the Top 5 maximum value is nlargest().

What does value_counts () do in Pandas?

Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.

How do you group data and count in Pandas?

You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.


1 Answers

I think you can use nlargest - you can change 1 to 5:

s = df['Neighborhood'].groupby(df['Borough']).value_counts() print s Borough                       Bronx          Melrose            7 Manhattan      Midtown           12                Lincoln Square     2 Staten Island  Grant City        11 dtype: int64  print s.groupby(level=[0,1]).nlargest(1) Bronx          Bronx          Melrose        7 Manhattan      Manhattan      Midtown       12 Staten Island  Staten Island  Grant City    11 dtype: int64 

additional columns were getting created, specified level info

like image 140
jezrael Avatar answered Sep 19 '22 02:09

jezrael