Difference between my aggregate() output and tidyverse output

Question

I am using a dataset of flights. I try to calculate the average arrival and departure delay for different airports (origin). I tried it with the aggregate function:

average_delay <- aggregate(cbind(arr_delay,dep_delay) ~ origin, FUN = mean, data = flights)
print(average_delay, digits = 3)

After that I tried it also with tidyverse:

library(tidyverse)
average_delay_tidy = flights %>%
  group_by(origin) %>% 
  summarise(arr_delay = mean(arr_delay, na.rm = TRUE),
            dep_delay = mean(dep_delay, na.rm = TRUE)) %>%
  mutate_if(is.numeric, round, digits = 3) %>%
  print

However I got different outputs:

Aggregate:

|origin|arr_delay |dep_delay |
:------|:--------:|:--------:|
|EWR   | 9.56     | **15.0**     |
|JFK   | 5.85     | **12.0**     |
|LGA   | 6.11     | 10.3     |

Tidyverse:

  <chr>      <dbl>     <dbl>
|origin|arr_delay |dep_delay |
:------|:--------:|:--------:|
|EWR   | 9.56     | **15.1**     |
|JFK   | 5.85     | **12.1**     |
|LGA   | 6.11     | 10.3     |

De difference is small, however, I do not understand how it is possible? Can someone explain to me why these outputs are different? Is it due to the NA values?

Thanks in advance!

akrun · Accepted Answer

The difference is so small that it seemed to be related to rounding as tibble does some print formatting for display, but in aggregate, if there are NAs, specify the na.action argument which by default is na.omit i.e. it will remove the row completely if there is a single NA element in the row. Change it to NULL

aggregate(cbind(arr_delay,dep_delay) ~ origin, FUN = mean, 
    na.rm = TRUE,  data = flights,
        na.action = NULL)

-output

 origin arr_delay dep_delay
1    EWR  9.107055  15.10795
2    JFK  5.551481  12.11216
3    LGA  5.783488  10.34688

To confirm if there are NA elements

 colSums(is.na(flights[c('arr_delay', 'dep_delay')]))
arr_delay dep_delay 
     9430      8255

Difference between my aggregate() output and tidyverse output

Tags:

r

aggregate

mean

tidyverse

Wutruvic

1 Answers

akrun

Recent Activity

Donate For Us

Difference between my aggregate() output and tidyverse output

Tags:

r

aggregate

mean

tidyverse

Wutruvic

1 Answers

akrun

Related questions

Recent Activity

Donate For Us