Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete duplicate rows with the same value in all columns in pandas

I have a dataframe with about a half a million rows. As I could see, there are plenty of duplicate rows, so how can I drop duplicate rows that have the same value in all of the columns (about 80 columns), not just one?

df:

period_start_time    id    val1    val2    val3
06.13.2017 22:00:00  i53    32      2       10
06.13.2017 22:00:00  i32    32      2       10
06.13.2017 22:00:00  i32    4       2       8
06.13.2017 22:00:00  i32    4       2       8
06.13.2017 22:00:00  i32    4       2       8
06.13.2017 22:00:00  i20    7       7       22
06.13.2017 22:00:00  i20    7       7       22

Desired output:

period_start_time    id    val1    val2    val3
06.13.2017 22:00:00  i53    32      2       10
06.13.2017 22:00:00  i32    32      2       10
06.13.2017 22:00:00  i32    4       2       8
06.13.2017 22:00:00  i20    7       7       22
like image 893
jovicbg Avatar asked Dec 19 '22 06:12

jovicbg


1 Answers

Use drop_duplicates:

df = df.drop_duplicates()
print (df)
     period_start_time   id  val1  val2  val3
0  06.13.2017 22:00:00  i53    32     2    10
1  06.13.2017 22:00:00  i32    32     2    10
2  06.13.2017 22:00:00  i32     4     2     8
5  06.13.2017 22:00:00  i20     7     7    22
like image 168
jezrael Avatar answered Dec 20 '22 20:12

jezrael