I have a dataframe with about a half a million rows. As I could see, there are plenty of duplicate rows, so how can I drop duplicate rows that have the same value in all of the columns (about 80 columns), not just one?
df:
period_start_time id val1 val2 val3
06.13.2017 22:00:00 i53 32 2 10
06.13.2017 22:00:00 i32 32 2 10
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i20 7 7 22
06.13.2017 22:00:00 i20 7 7 22
Desired output:
period_start_time id val1 val2 val3
06.13.2017 22:00:00 i53 32 2 10
06.13.2017 22:00:00 i32 32 2 10
06.13.2017 22:00:00 i32 4 2 8
06.13.2017 22:00:00 i20 7 7 22
Use drop_duplicates
:
df = df.drop_duplicates()
print (df)
period_start_time id val1 val2 val3
0 06.13.2017 22:00:00 i53 32 2 10
1 06.13.2017 22:00:00 i32 32 2 10
2 06.13.2017 22:00:00 i32 4 2 8
5 06.13.2017 22:00:00 i20 7 7 22
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With