Group by and find top n value_counts pandas

Tags:

I have a dataframe of taxi data with two columns that looks like this:

Neighborhood    Borough        Time Midtown         Manhattan      X Melrose         Bronx          Y Grant City      Staten Island  Z Midtown         Manhattan      A Lincoln Square  Manhattan      B

Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this:

df['Neighborhood'].groupby(df['Borough']).value_counts()

Which gives me something like this:

borough                           Bronx          High  Bridge          3424                Mott Haven            2515                Concourse Village     1443                Port Morris           1153                Melrose                492                North Riverdale        463                Eastchester            434                Concourse              395                Fordham                252                Wakefield              214                Kingsbridge            212                Mount Hope             200                Parkchester            191 ......  Staten Island  Castleton Corners        4                Dongan Hills             4                Eltingville              4                Graniteville             4                Great Kills              4                Castleton                3                Woodrow                  1

How do I filter it so that I get only the top 5 from each? I know there are a few questions with a similar title but they weren't helpful to my case.

433

asked Feb 12 '16 14:02

ytk

1 Answers

I think you can use nlargest - you can change 1 to 5:

s = df['Neighborhood'].groupby(df['Borough']).value_counts() print s Borough                       Bronx          Melrose            7 Manhattan      Midtown           12                Lincoln Square     2 Staten Island  Grant City        11 dtype: int64  print s.groupby(level=[0,1]).nlargest(1) Bronx          Bronx          Melrose        7 Manhattan      Manhattan      Midtown       12 Staten Island  Staten Island  Grant City    11 dtype: int64

additional columns were getting created, specified level info

140

answered Sep 19 '22 02:09

jezrael

Related questions
                            
                                sort Python list with two keys but only one in reverse order
                            
                                Get intersecting rows across two 2D numpy arrays
                            
                                Flask and Werkzeug: Testing a post request with custom headers
                            
                                Searching for equivalent of FileNotFoundError in Python 2
                            
                                Parsing outlook .msg files with python
                            
                                What is "backlog" in TCP connections?
                            
                                How can I use "e" (Euler's number) and power operation in python 2.7
                            
                                How to read images into a script without using using imageio or scikit image?
                            
                                What are the implications of running python with the optimize flag?
                            
                                Why use Tornado and Flask together?
                            
                                Python, SQLAlchemy pass parameters in connection.execute
                            
                                Pylint: overriding max-line-length in individual file
                            
                                Calculate Matrix Rank using scipy
                            
                                Access the sole element of a set
                            
                                TypeError: 'list' object cannot be interpreted as an integer
                            
                                How can I get the name of an object?
                            
                                Python timedelta issue with negative values
                            
                                Testing file uploads in Flask
                            
                                Remove list from list in Python [duplicate]
                            
                                Python for loop and iterator behavior

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Group by and find top n value_counts pandas

Tags:

python

pandas

dataframe

ytk

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us