Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to aggregate by names in two columns in R?

I got a Data Frame:

a <- c('A','A','B','B','A')
b <- c(1,1,1,1,2)
c <- c(NA,60,NA,100,NA)
d <- c(10,NA,10,NA,100)

frame <- data.frame(a,b,c,d)

> frame
  a  b   c  d
1 A  1  NA  10
2 A  1  60  NA
3 B  1  NA  10
4 B  1 100  NA
5 A  2  NA  100

And I want to aggregate it by a and b

>frame2
  a  b   c  d
1 A  1  60  10
3 B  1 100  10
5 A  2  NA  100

I tried several things like aggregat() and group from dplyr but somehow it never works. I guess the NA is a problem.

like image 268
sfetan Avatar asked Dec 10 '22 01:12

sfetan


2 Answers

With aggregate, we may need to use na.action

aggregate(.~ a + b, frame, sum, na.rm = TRUE, na.action = 'na.pass')
#   a b   c   d
#1 A 1  60  10
#2 B 1 100  10
#3 A 2   0 100

If we intend to subset the rows

library(dplyr)
frame %>% 
    group_by(a, b) %>%
    mutate_at(vars(-group_cols()), ~ .[order(is.na(.))]) %>% 
    slice(1)
# A tibble: 3 x 4
# Groups:   a, b [3]
#  a         b     c     d
#  <fct> <dbl> <dbl> <dbl>
#1 A         1    60    10
#2 A         2    NA   100
#3 B         1   100    10
like image 171
akrun Avatar answered Jan 03 '23 16:01

akrun


Using data.table and hablar::sum_:

library(data.table)

setDT(frame)[,.(c = as.numeric(hablar::sum_(c)), 
                d = as.numeric(hablar::sum_(d))), .(a,b)]
#>    a b   c   d
#> 1: A 1  60  10
#> 2: B 1 100  10
#> 3: A 2  NA 100

Or in base we can define our own function and use it with aggregate as akrun illustrated in their answer:

sum__ <- function(x){if(all(is.na(x))) NA_real_ else sum(x, na.rm=T)}

aggregate(.~ a + b, frame, sum__, na.action = 'na.pass')
like image 38
M-- Avatar answered Jan 03 '23 16:01

M--