Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe selective data cleaning post groupby

i am new to pandas and i would like to know how to clean data by extracting only parts of rows. Say i have a dataframe as follows:

column1      date    key
A            2016    SB
A            2017    B
B            2015    SB
C            2014    SB
C            2014    PB
C            2015    B
C            2016    SB

how do i clean the data such that for each of the same column1 value, i only extract the first two rows value and ignore the rest (for example on C value, only 2014 SB and 2014 PB is what i get) ?

column1      date    key
A            2016    SB
A            2017    B
B            2015    SB
C            2014    SB
C            2014    PB

Thank you

like image 294
hellochan Avatar asked Dec 06 '22 14:12

hellochan


1 Answers

You need GroupBy.head, check also docs:

df = df.groupby('column1').head(2)
print (df)
  column1  date key
0       A  2016  SB
1       A  2017   B
2       B  2015  SB
3       C  2014  SB
4       C  2014  PB
like image 125
jezrael Avatar answered Mar 13 '23 02:03

jezrael