Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting rows that are duplicated in one column based on the conditions of another column

Here is an example of my data set;

Date      Time(GMT)Depth Temp  Salinity Density Phosphate 24/06/2002  1000    1           33.855          0.01 24/06/2002  1000    45          33.827          0.01 01/07/2002  1000    10  13.26   33.104  24.873  0.06 01/07/2002  1000    30  12.01   33.787  25.646  0.13 08/07/2002  1000    5   13.34   33.609  25.248  0.01 08/07/2002  1000    40  12.01   34.258  26.011  1.33 15/07/2002  1000    30  12.04   34.507  26.199  0.01 22/07/2002  1000    5   13.93   33.792  25.269  0.01 22/07/2002  1000    30  11.9    34.438  26.172  0.08 29/07/2002  1000    5   13.23   34.09   25.642  0.01 

I want to delete duplicate rows so that I only have one row per date, I want to do this based on the Depth, I would like to keep the row with the greatest (deepest) depth. Any ideas?

like image 772
helen.h Avatar asked Jun 03 '14 09:06

helen.h


2 Answers

Lets say you have data in df

df = df[order(df[,'Date'],-df[,'Depth']),] df = df[!duplicated(df$Date),] 
like image 99
vrajs5 Avatar answered Oct 17 '22 08:10

vrajs5


Here's one way to do it in a single dplyr call:

# Remove any duplicates df <- df %>%   arrange(Date, -Depth) %>%   filter(duplicated(Date) == FALSE) 
like image 39
Ryan Bradley Avatar answered Oct 17 '22 09:10

Ryan Bradley