Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate by NA in R

Tags:

r

aggregate

na

Does anybody know how to aggregate by NA in R.

If you take the example below

a <- matrix(1,5,2)
a[1:2,2] <- NA
a[3:5,2] <- 2
aggregate(a[,1], by=list(a[,2]), sum)

The output is:

Group.1 x
2       3

But is there a way to get the output to include NAs in the output like this:

Group.1 x
2       3
NA      2

Thanks

like image 502
wilsonm2 Avatar asked Aug 25 '15 21:08

wilsonm2


People also ask

What does Na aggregate do in R?

Description. Generic function for replacing each NA with aggregated values. This allows imputing by the overall mean, by monthly means, etc.

How do you find the sum of NA values in R?

To find the sum of non-missing values in an R data frame column, we can simply use sum function and set the na. rm to TRUE. For example, if we have a data frame called df that contains a column say x which has some missing values then the sum of the non-missing values can be found by using the command sum(df$x,na.

How do I exclude data with NA in R?

First, if we want to exclude missing values from mathematical operations use the na. rm = TRUE argument. If you do not exclude these values most functions will return an NA . We may also desire to subset our data to obtain complete observations, those observations (rows) in our data that contain no missing data.


3 Answers

Instead of aggregate(), you may want to consider rowsum(). It is actually designed for this exact operation on matrices and is known to be much faster than aggregate(). We can add NA to the factor levels of a[, 2] with addNA(). This will assure that NA shows up as a grouping variable.

rowsum(a[, 1], addNA(a[, 2]))
#      [,1]
# 2       3
# <NA>    2

If you still want to use aggregate(), you can incorporate addNA() as well.

aggregate(a[, 1], list(Group = addNA(a[, 2])), sum)
#   Group x
# 1     2 3
# 2  <NA> 2

And one more option with data.table -

library(data.table)
as.data.table(a)[, .(x = sum(V1)), by = .(Group = V2)]
#    Group x
# 1:    NA 2
# 2:     2 3
like image 114
Rich Scriven Avatar answered Nov 02 '22 23:11

Rich Scriven


Use summarize from dplyr

library(dplyr)

a %>%
  as.data.frame %>%
  group_by(V2) %>%
  summarize(V1_sum = sum(V1))
like image 41
bramtayl Avatar answered Nov 02 '22 23:11

bramtayl


Using sqldf:

a <- as.data.frame(a)
sqldf("SELECT V2 [Group], SUM(V1) x 
      FROM a 
      GROUP BY V2")

Output:

  Group x
1    NA 2
2     2 3

stats package

A variation of AdamO's proposal:

data.frame(xtabs( V1 ~ V2 , data = a,na.action = na.pass, exclude = NULL))

Output:

    V2 Freq
1    2    3
2 <NA>    2
like image 32
mpalanco Avatar answered Nov 03 '22 01:11

mpalanco