Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Show first 10 rows of multi-index pandas dataframe

I have a multilevel index pandas DataFrame where the first level is year and the second level is username. I only have one column which is already sorted in a descending manner. I want to show the first 2 rows of each index level 0.

What I have:

               count
year username                
2010 b         677
     a         505
     c         400
     d         300
 ...
2014 a         100
     b         80

What I want:

               count
year username                
2010 b         677
     a         505
2011 c         677
     d         505
2012 e         677
     f         505
2013 g         677
     i         505
2014 h         677
     j         505
like image 608
David Avatar asked Sep 13 '15 19:09

David


People also ask

How do you view first 10 rows in Pandas?

Use pandas. DataFrame. head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start).


2 Answers

Here is an answer. Maybe there is a better way to do that (with indexing ?), but I thing it works. The principle seems complex but is quite simple:

  • Index the DataFrame by year and username.
  • Group the DataFrame by year which is the first level (=0) of the index
  • Apply two operations on the sub DataFrame obtained by the groupby (one for each year)
    • sort the index by count in ascending order sort_index(by='count')-> the row with more counts will be at the tail of the DataFrame
    • Only keep the last top rows (2 in this case) by using the negative slicing notation ([-top:]). The tail method could also be used (tail(top)) to improve readability.
  • Dropping the unnecessary level created for year droplevel(0)

# Test data    
df = pd.DataFrame({'year': [2010, 2010, 2010, 2011,2011,2011, 2012, 2012, 2013, 2013, 2014, 2014],
                  'username': ['b','a','a','c','c','d','e','f','g','i','h','j'],
                  'count': [400, 505, 678, 677, 505, 505, 677, 505, 677, 505, 677, 505]})
df = df.set_index(['year','username'])

top = 2
df = df.groupby(level=0).apply(lambda df: df.sort_index(by='count')[-top:])
df.index = df.index.droplevel(0)
df

               count
year username       
2010 a           505
     a           678
2011 d           505
     c           677
2012 f           505
     e           677
2013 i           505
     g           677
2014 j           505
     h           677
like image 165
Romain Avatar answered Nov 15 '22 08:11

Romain


I ran into the same problem and found a neater answer in the docs (pandas version 1.0.1): GroupBy: taking the first rows of each group. Here is the trick, assuming your dataframe is called df:

df.groupby(level=0).head(2)
like image 27
Lightspark Avatar answered Nov 15 '22 06:11

Lightspark