I would like to express the following SQL query using base R (without any particular package):
select month, day, count(*) as count, avg(dep_delay) as avg_delay
from flights
group by month, day
having count > 1000
It selects the mean departure delay and the number of flights per day on busy days (days with more than 1000 flights). The dataset is nycflights13 containing information of flights departed from NYC in 2013.
Notice I can easily write this in dplyr as:
flights %>%
group_by(month, day) %>%
summarise(count = n(), avg_delay = mean(dep_delay, na.rm = TRUE)) %>%
filter(count > 1000)
Since I was reminded earlier about the elegance of by
(tip of the hat to @Parfait), here is a solution using by
:
res <- by(flights, list(flights$month, flights$day), function(x)
if (nrow(x) > 1000) {
c(
month = unique(x$month),
day = unique(x$day),
count = nrow(x),
avg_delay = mean(x$dep_delay, na.rm = TRUE))
})
# Store in data.frame and order by month, day
df <- do.call(rbind, res);
df <- df[order(df[, 1], df[, 2]) ,];
# month day count avg_delay
#[1,] 7 8 1004 37.296646
#[2,] 7 9 1001 30.711499
#[3,] 7 10 1004 52.860702
#[4,] 7 11 1006 23.609392
#[5,] 7 12 1002 25.096154
#[6,] 7 17 1001 13.670707
#[7,] 7 18 1003 20.626789
#[8,] 7 25 1003 19.674134
#[9,] 7 31 1001 6.280843
#[10,] 8 7 1001 8.680402
#[11,] 8 8 1001 43.349947
#[12,] 8 12 1001 8.308157
#[13,] 11 27 1014 16.697651
#[14,] 12 2 1004 9.021978
as commented you can use a combi of subset
and aggregate
. Changed the order of day & month to recieve the same order as your dplyr
approach. Using na.action = NULL
to count rows inclunding NA
s.
library(nycflights13)
#> Warning: Paket 'nycflights13' wurde unter R Version 3.4.4 erstellt
subset(aggregate(dep_delay ~ day + month, flights,
function(x) cbind(count=length(x), avg_delay=mean(x, na.rm = TRUE)),
na.action = NULL),
dep_delay[,1] > 1000)
#> day month dep_delay.1 dep_delay.2
#> 189 8 7 1004.000000 37.296646
#> 190 9 7 1001.000000 30.711499
#> 191 10 7 1004.000000 52.860702
#> 192 11 7 1006.000000 23.609392
#> 193 12 7 1002.000000 25.096154
#> 198 17 7 1001.000000 13.670707
#> 199 18 7 1003.000000 20.626789
#> 206 25 7 1003.000000 19.674134
#> 212 31 7 1001.000000 6.280843
#> 219 7 8 1001.000000 8.680402
#> 220 8 8 1001.000000 43.349947
#> 224 12 8 1001.000000 8.308157
#> 331 27 11 1014.000000 16.697651
#> 336 2 12 1004.000000 9.021978
Created on 2018-04-05 by the reprex package (v0.2.0).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With