Let's say I have four samples: id=1, 2, 3, and 4, with one or more measurements on each of those samples: <pre class="prettyprint"><code>> a <- data.frame(id=c(1,1,2,2,3,4), value=c(1,2,3,-4,-5,6)) > a id value 1 1 1 2 1 2 3 2 3 4 2 -4 5 3 -5 6 4 6 </code></pre> I want to remove duplicates, keeping only one entry per ID - the one having the largest absolute value of the "value" column. I.e., this is what I want: <pre class="prettyprint"><code>> a[c(2,4,5,6), ] id value 2 1 2 4 2 -4 5 3 -5 6 4 6 </code></pre> How might I do this in R?

First. Sort in the order putting the less desired items last within <code>id</code> groups <pre class="prettyprint"><code> aa <- a[order(a$id, -abs(a$value) ), ] #sort by id and reverse of abs(value) </code></pre> Then: Remove items after the first within <code>id</code> groups <pre class="prettyprint"><code> aa[ !duplicated(aa$id), ] # take the first row within each id id value 2 1 2 4 2 -4 5 3 -5 6 4 6 </code></pre>

Remove duplicates keeping entry with largest absolute value

Tags:

r

duplicates

duplicate-removal

Let's say I have four samples: id=1, 2, 3, and 4, with one or more measurements on each of those samples:

> a <- data.frame(id=c(1,1,2,2,3,4), value=c(1,2,3,-4,-5,6)) > a   id value 1  1     1 2  1     2 3  2     3 4  2    -4 5  3    -5 6  4     6

I want to remove duplicates, keeping only one entry per ID - the one having the largest absolute value of the "value" column. I.e., this is what I want:

> a[c(2,4,5,6), ]   id value 2  1     2 4  2    -4 5  3    -5 6  4     6

How might I do this in R?

731

asked Oct 09 '12 18:10

Stephen Turner

1 Answers

First. Sort in the order putting the less desired items last within id groups

 aa <- a[order(a$id, -abs(a$value) ), ] #sort by id and reverse of abs(value)

Then: Remove items after the first within id groups

 aa[ !duplicated(aa$id), ]              # take the first row within each id   id value 2  1     2 4  2    -4 5  3    -5 6  4     6

196

answered Sep 21 '22 17:09

IRTFM

Related questions
                            
                                How to generate a frequency table in R with with cumulative frequency and relative frequency
                            
                                Boxplot in R showing the mean
                            
                                data.table row-wise sum, mean, min, max like dplyr?
                            
                                How do I draw gridlines using abline() that are behind the data?
                            
                                R: Error in fBody[[i]] : no such index at level 4
                            
                                Recursively repeat vector elements N times each [duplicate]
                            
                                Regular expressions (RegEx) and dplyr::filter()
                            
                                Raw text strings for file paths in R
                            
                                How to add percentage or count labels above percentage bar plot?
                            
                                poly() in lm(): difference between raw vs. orthogonal
                            
                                How to fill matrix with random numbers in R?
                            
                                Suppressing "null device" output with R in batch mode
                            
                                Add a "rank" column to a data frame
                            
                                Format numbers to significant figures nicely in R
                            
                                Efficiently sum across multiple columns in R
                            
                                Concatenate row-wise across specific columns of dataframe
                            
                                Transposing a dataframe maintaining the first column as heading
                            
                                What's the difference between substitute and quote in R
                            
                                How do I install an R package from the source tarball on windows?
                            
                                subtract a constant vector from each row in a matrix in r

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With