Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicates keeping entry with largest absolute value

Let's say I have four samples: id=1, 2, 3, and 4, with one or more measurements on each of those samples:

> a <- data.frame(id=c(1,1,2,2,3,4), value=c(1,2,3,-4,-5,6)) > a   id value 1  1     1 2  1     2 3  2     3 4  2    -4 5  3    -5 6  4     6 

I want to remove duplicates, keeping only one entry per ID - the one having the largest absolute value of the "value" column. I.e., this is what I want:

> a[c(2,4,5,6), ]   id value 2  1     2 4  2    -4 5  3    -5 6  4     6 

How might I do this in R?

like image 731
Stephen Turner Avatar asked Oct 09 '12 18:10

Stephen Turner


People also ask

How do I remove duplicate rows and keep the highest value only?

(1) Select Fruit column (which you will remove duplicates rows by), and then click the Primary Key button; (2) Select the Amount column (Which you will keep highest values in), and then click Calculate > Max. (3) Specify combination rules for other columns as you need. 3.

How do you remove duplicates with the highest value?

1. If you want to remove all duplicates but leave the highest ones, you can apply this formula =MAX(IF($A$2:$A$12=D2,$B$2:$B$12)), remember to press Shift + Ctrl + Enter keys. 2. In the above formulas, A2:A12 is the original list you need to remove duplicates from.


1 Answers

First. Sort in the order putting the less desired items last within id groups

 aa <- a[order(a$id, -abs(a$value) ), ] #sort by id and reverse of abs(value) 

Then: Remove items after the first within id groups

 aa[ !duplicated(aa$id), ]              # take the first row within each id   id value 2  1     2 4  2    -4 5  3    -5 6  4     6 
like image 196
IRTFM Avatar answered Sep 21 '22 17:09

IRTFM