Let's say I have four samples: id=1, 2, 3, and 4, with one or more measurements on each of those samples:
> a <- data.frame(id=c(1,1,2,2,3,4), value=c(1,2,3,-4,-5,6)) > a id value 1 1 1 2 1 2 3 2 3 4 2 -4 5 3 -5 6 4 6
I want to remove duplicates, keeping only one entry per ID - the one having the largest absolute value of the "value" column. I.e., this is what I want:
> a[c(2,4,5,6), ] id value 2 1 2 4 2 -4 5 3 -5 6 4 6
How might I do this in R?
(1) Select Fruit column (which you will remove duplicates rows by), and then click the Primary Key button; (2) Select the Amount column (Which you will keep highest values in), and then click Calculate > Max. (3) Specify combination rules for other columns as you need. 3.
1. If you want to remove all duplicates but leave the highest ones, you can apply this formula =MAX(IF($A$2:$A$12=D2,$B$2:$B$12)), remember to press Shift + Ctrl + Enter keys. 2. In the above formulas, A2:A12 is the original list you need to remove duplicates from.
First. Sort in the order putting the less desired items last within id
groups
aa <- a[order(a$id, -abs(a$value) ), ] #sort by id and reverse of abs(value)
Then: Remove items after the first within id
groups
aa[ !duplicated(aa$id), ] # take the first row within each id id value 2 1 2 4 2 -4 5 3 -5 6 4 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With