i am new to pandas and i would like to know how to clean data by extracting only parts of rows. Say i have a dataframe as follows:
column1 date key
A 2016 SB
A 2017 B
B 2015 SB
C 2014 SB
C 2014 PB
C 2015 B
C 2016 SB
how do i clean the data such that for each of the same column1 value, i only extract the first two rows value and ignore the rest (for example on C value, only 2014 SB and 2014 PB is what i get) ?
column1 date key
A 2016 SB
A 2017 B
B 2015 SB
C 2014 SB
C 2014 PB
Thank you
You need GroupBy.head, check also docs:
df = df.groupby('column1').head(2)
print (df)
column1 date key
0 A 2016 SB
1 A 2017 B
2 B 2015 SB
3 C 2014 SB
4 C 2014 PB
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With