Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summary function Rounding Error in r?

I have a 16968 row data frame (reasons for exactitude will be made clear below). I am checking to see whether a running variable (data$Ob) is actually counting every row in sequential order (first row data$Ob is 1...last row data$Ob is 16968 and for every row in between.

When I ran summary(data$Ob) it tells me that the maximum is 16970 not 16968. When I ran max(data$Ob) it says the maximum is 16968, and not the value from summary.

I ran a for-loop to check each observation, and it looks like the max() function is right and that the data$Ob variable is doing what its supposed to. But does anyone know why the summary function is off by 2? I assume rounding error (somehow?) but this data check is critical to the analysis I'm doing and if its wrong then my subsequent analysis will be bunk.

Here's the for-loop I ran but I don't think that is critical to this question.

checker <- vector(length=nrow(rd))
na.checker <- vector(length=nrow(rd))
    for (i in 1:nrow(rd)){
        checker[i] <- ifelse(i==rd$Ob[i], 1, 0)
        na.checker[i] <- ifelse(is.na(rd$Ob[i])==TRUE,0,1)
}
sum(checker)

Thanks.

like image 706
BGTP33 Avatar asked Jan 13 '23 11:01

BGTP33


1 Answers

Without a reproducible example it is hard to tell but it smells like the mother of all FAQs: the default display precision is four digits, so 16968 gets rounded to 16970.

Edit: We do need your sample data here because with a naive example I cannot reproduce this:

R> set.seed(42) 
R> df <- data.frame(a=as.numeric(1:16968), b=16968:1, 
+                   c=rnorm(16968), d=runif(16968))
R> summary(df)
       a               b               c                  d           
 Min.   :    1   Min.   :    1   Min.   :-4.04328   Min.   :0.000101  
 1st Qu.: 4243   1st Qu.: 4243   1st Qu.:-0.68271   1st Qu.:0.252515  
 Median : 8484   Median : 8484   Median :-0.00528   Median :0.505090  
 Mean   : 8484   Mean   : 8484   Mean   :-0.00834   Mean   :0.504563  
 3rd Qu.:12726   3rd Qu.:12726   3rd Qu.: 0.66746   3rd Qu.:0.758991  
 Max.   :16968   Max.   :16968   Max.   : 4.32809   Max.   :0.999976  

Edit 2, with h/t to @SimonO101:

R> summary(df$a)                                   ## what OP saw
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1    4240    8480    8480   12700   17000 
R> summary(df$a, digits=6)                         ## what OP wanted to see
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1    4243    8484    8484   12726   16968 
R> 
like image 88
Dirk Eddelbuettel Avatar answered Jan 19 '23 11:01

Dirk Eddelbuettel