Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatic rounding in dplyr::summarise() function [duplicate]

Tags:

rounding

r

dplyr

I am wondering if there is any easy way to specify the number of digits reported by summarise in dplyr, ideally using a native dplyr or other tidyverse function?

Here's some toy data

library(dplyr)

df <- data.frame(group = rep(letters[1:2], each = 10, length.out = 40),
                 large = rnorm(40, 100, 15),
                 small = rnorm(40, 0.5, 0.02))

If we then summarise via

df %>% group_by(group) %>% summarise(mL = mean(large), mS = mean(small)) 

We get

#   group    mL    mS
#   <fct> <dbl> <dbl>
# 1 a     104.  0.496
# 2 b      97.6 0.506

Note that without specifying any rounding the variable with the higher mean has been rounded to 1 decimal place and the variable with the smaller mean has been rounded to 3.

Now want if we want the variable with the larger mean to also be reported to 3 decimal places? If we include a command to round like so

df %>% group_by(group) %>% summarise(mL = round(mean(large),3), mS = mean(small))

There is no change in the output

#   group    mL    mS
#   <fct> <dbl> <dbl>
# 1 a     104.  0.496
# 2 b      97.6 0.506

Only if we use the format() function can we obtain what we are after

df %>% group_by(group) %>% summarise(mL = format(round(mean(large),3),3), mS = mean(small))

  group      mL    mS
  <fct> <chr>   <dbl>
1 a     103.888 0.496
2 b     97.626  0.506

Is there an easier way to do this? Ideally using some kind of tidyverse function.

like image 905
llewmills Avatar asked Nov 21 '19 02:11

llewmills


People also ask

How do you round in Summarise in R?

To round the output of summary function in R, we can use digits argument while applying the summary function.

What is summarize in Dplyr?

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input.


1 Answers

This is to do with the way tibbles are printed. The actual numbers in the data frame still have all the decimal places they are just not displayed when printing the tibble.

You can use as.data.frame or print.data.frame() which will show you more decimal points (depending on your getOption("digits")). You can also change the tibble settings but my understanding is that these are always based on significant figures rather than decimal points (so your values >100 will have fewer decimal points than values <100) See https://tibble.tidyverse.org/reference/formatting.html for tibble printing options

So

df %>% group_by(group) %>% summarise(mL = round(mean(large),3), mS = round(mean(small),3)) %>%
     as.data.frame()

will give you values to 3 decimal places, and

df %>% group_by(group) %>% summarise(mL = mean(large), mS = mean(small))  %>%
     as.data.frame()

will show to getOption("digits") decimal places (I think 7 is default).

Also note if you do want to do the same thing to multiple columns in summarise, summarise_at() can be very helpful, e.g.

df %>% group_by(group) %>% summarise_at(c("large","small"), ~round(mean(.),3)) %>% 
    print.data.frame()
like image 151
Sarah Avatar answered Sep 27 '22 23:09

Sarah