Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a unique in R by column A and keep the row with maximum value in column B

Tags:

r

unique

I have a data.frame with several columns (17). Column 2 have several rows with the same value, I want to keep only one of those rows, specifically the one that has the maximum value in column 17.

For example:

A    B
'a'  1
'a'  2
'a'  3
'b'  5
'b'  200

Would return
A    B
'a'  3
'b'  200

(plus the rest of the columns)

So far I've been using the unique function, but I think it randomly keeps one or keeps just the first one that appears.

** UPDATE ** The real data has 376000 rows. I've tried the data.table and ddply suggestions but they take forever. Any idea which is the most efficient?

like image 410
biojl Avatar asked Jan 14 '23 12:01

biojl


2 Answers

A solution using package data.table:

set.seed(42)
dat <- data.frame(A=c('a','a','a','b','b'),B=c(1,2,3,5,200),C=rnorm(5))
library(data.table)

dat <- as.data.table(dat)
dat[,.SD[which.max(B)],by=A]

   A   B         C
1: a   3 0.3631284
2: b 200 0.4042683
like image 55
Roland Avatar answered Jan 17 '23 01:01

Roland


A not so elegant solution using R base functions

> ind <- with(dat, tapply(B, A, which.max)) # Using @Roland's data
> mysplit <- split(dat, dat$A)
> do.call(rbind, lapply(1:length(mysplit), function(i) mysplit[[i]][ind[i],]))
  A   B         C
3 a   3 0.3631284
5 b 200 0.4042683
like image 24
Jilber Urbina Avatar answered Jan 17 '23 00:01

Jilber Urbina