Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I summarise all columns except one(s) I specify?

Tags:

r

dplyr

I want to sum up all but one numerical column in this dataframe.

Group, Registered, Votes, Beans
A,     111,        12,     100
A,     111,        13,     200
A,     111,        14,     300

I want to group this by Group, summing up all the columns except Registered.

summarise_if(
  .tbl = group_by(
    .data = x,
    Precinct
  ),
  .predicate = is.numeric,
  .funs = sum
)

Problem here is the result is a data frame that sums ALL the numeric columns, including Registered. How do I sum all but Registered?

The output I want would look like this

Group, Registered, Votes, Beans
A,     111,        39,    600
like image 348
Username Avatar asked Nov 28 '18 15:11

Username


People also ask

How do you summarize multiple columns?

Press "Ctrl + Space" to select it, then hold "Shift" and using the lateral arrow keys to select the other columns. After selecting all the columns you want to add together, the bar should display a formula such as "=SUM(A:C)," with the range displaying the column letter names.

How do I summarize a column in R?

summary statistic is computed using summary() function in R. summary() function is automatically applied to each column. The format of the result depends on the data type of the column. If the column is a numeric variable, mean, median, min, max and quartiles are returned.


2 Answers

I would use summarise_at, and just make a logical vector which is FALSE for non-numeric columns and Registered and TRUE otherwise, i.e.

df %>% 
  summarise_at(which(sapply(df, is.numeric) & names(df) != 'Registered'), sum)

If you wanted to just summarise all but one column you could do

df %>% 
  summarise_at(vars(-Registered), sum)

but in this case you have to check if it's numeric also.

Notes:

  • factors are technically numeric, so if you want to exclude non-numeric columns and factors, replace sapply(df, is.numeric) with sapply(df, function(x) is.numeric(x) & !is.factor(x))

  • If your data is big I think it is faster to use sapply(df[1,], is.numeric) instead of sapply(df, is.numeric). (Someone please correct me if I'm wrong)

Edit:

Modified versions of the two methods above for dplyr version >= 1, since summarise_at is superseded

df %>% 
  summarise(across(where(is.numeric) & !Registered, sum))

df %>% 
  summarise(across(-Registered, sum))
like image 96
IceCreamToucan Avatar answered Sep 28 '22 20:09

IceCreamToucan


We can use summarise_if

library(dplyr)
df %>% 
   select(-Registered) %>%
   summarise_if(is.numeric, sum)
#  Votes Beans
#1    39   600
like image 31
akrun Avatar answered Sep 28 '22 20:09

akrun