Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of significant digits in dplyr summarise

Tags:

r

dplyr

I am having trouble getting the desired number of decimal places from summarise. Here is a simple example:

test2  <- data.frame(c("a","a","b","b"), c(245,246,247,248))
library(dplyr)
colnames(test2)  <- c("V1","V2")
group_by(test2,V1) %>% summarise(mean(V2))

The dataframe is:

  V1  V2
1  a 245
2  a 246
3  b 247
4  b 248

The output is:

 V1     `mean(V2)`
 <fctr>      <dbl>
1 a             246
2 b             248

I would like it to give me the means including the decimal place (i.e. 245.5 and 247.5)

like image 375
KNW Avatar asked Jan 19 '18 21:01

KNW


People also ask

How do you round summarize in R?

To round the output of summary function in R, we can use digits argument while applying the summary function.

How do you set significant digits in R?

signif() function in R Language is used to round to the specified number of significant digits.

How do you round to 2 decimal places in R?

You can use the following functions to round numbers in R: round(x, digits = 0): Rounds values to specified number of decimal places. signif(x, digits = 6): Rounds values to specified number of significant digits.


2 Answers

Because you are using dplyr tools, the resulting output is actually a tibble, which by default prints numbers with 3 significant digits (see option pillar.sigfig). This is not the same as number of digits after the period. To obtain the latter, convert it simply to a data.frame: as.data.frame

Note that tibble's concept of significant digits is somehow complicated, and does not indicate how many digits after the period are represented, but the minimum number of digits necessary to have a given accurate representation of the number (I think 99.9%, see discussion here).

This means the number of digits printed depends on the "size" of your number:

library(tibble)
packageVersion("tibble")
#> [1] '2.1.3'
packageVersion("pillar")
#> [1] '1.4.2'
tab <- tibble(x = c(0.1234, 1.1234, 10.1234, 100.1234, 1000.1234))

options(pillar.sigfig=3)
tab
#> # A tibble: 5 x 1
#>          x
#>      <dbl>
#> 1    0.123
#> 2    1.12 
#> 3   10.1  
#> 4  100.   
#> 5 1000.

options(pillar.sigfig=4)
tab
#> # A tibble: 5 x 1
#>           x
#>       <dbl>
#> 1    0.1234
#> 2    1.123 
#> 3   10.12  
#> 4  100.1   
#> 5 1000.

as.data.frame(tab)
#>           x
#> 1    0.1234
#> 2    1.1234
#> 3   10.1234
#> 4  100.1234
#> 5 1000.1234

Created on 2019-08-21 by the reprex package (v0.3.0)

like image 182
Matifou Avatar answered Sep 21 '22 14:09

Matifou


This is one solution-

test2  <- data.frame(c("a", "a", "b", "b"), c(245, 246, 247, 248))
library(dplyr)
colnames(test2)  <- c("V1", "V2")
group_by(test2, V1) %>% 
  dplyr::summarise(mean(V2)) %>% 
  dplyr::mutate_if(is.numeric, format, 1)
#> # A tibble: 2 x 2
#>   V1    `mean(V2)`
#>   <fct> <chr>     
#> 1 a     245.5     
#> 2 b     247.5

Created on 2018-01-20 by the reprex package (v0.1.1.9000).

EDIT :

If you want to keep it as numeric :

test2  <- data.frame(c("a", "a", "b", "b"), c(245, 246, 247, 248))
library(dplyr)
colnames(test2)  <- c("V1", "V2")
group_by(test2, V1) %>% 
  dplyr::summarise(mean(V2)) %>% 
  as.data.frame(.) %>% 
  dplyr::mutate_if(is.numeric, round, 1)

Gives

  V1 mean(V2)
1  a    245.5
2  b    247.5

And with another example (from @Matifou) :

tab <- tibble(x = c(0.1234, 1.1234, 10.1234, 100.1234, 1000.1234))

tab %>%  
  as.data.frame(.) %>% 
  dplyr::mutate_if(is.numeric, round, 2)

Gives :

        x
1    0.12
2    1.12
3   10.12
4  100.12
5 1000.12
like image 30
Indrajeet Patil Avatar answered Sep 19 '22 14:09

Indrajeet Patil