Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table alternative to piping

I'm currently learning the very robust and efficient data.table framework(package). I however can't seem to figure out how to do something like this. What I'm looking to do is group by multiple columns(manufacturer and carier), get the number of flights based on this grouping then arrange these in descending order followed by a ggplot of the top 10 manufacturers and carriers. I would do this in the tidyverse as follows:

library(nycflights13)
library(tidyverse)
flights %>% 
  left_join(planes, by = "tailnum") %>% 
  group_by(manufacturer, carrier) %>% 
  summarise(N = n()) %>% 
  arrange(desc(N)) %>% 
  top_n(10, N) %>% 
  ggplot(aes(carrier, N, fill = manufacturer)) + geom_col() + guides(fill = FALSE)

Here is what I've tried:(I left the question for several minutes to try and solve it but failed)

library(data.table)
fly<-copy(nycflights13::flights)
setDT(fly)
setkey(fly,tailnum)
planes1 <- copy(planes)
setDT(planes1)
setkey(planes1, tailnum)
#head(planes1,2)
Merged <- merge(fly, planes1, by = "tailnum")
#Group by manufacturer
Merged[, .N, by = .(manufacturer,carrier)] #[, order(manufacturer, carrier)]

The problem is I can't get to return the ordered data and also don't know how to "chain" to ggplot without saving the ordered merge as an object first.

like image 448
NelsonGon Avatar asked Jan 13 '19 10:01

NelsonGon


1 Answers

You can use the square brackets [ & ] to chain stuff together in data.table. Furthermore, you can execute a ggplot call inside the j part of the data.table syntax:

nms <- setdiff(names(planes1), "tailnum")

fly[planes1, on = .(tailnum), (nms) := mget(nms)
    ][, .N, by = .(manufacturer,carrier)
      ][order(-N)
        ][, .SD[1:10], by = .(manufacturer,carrier)
          ][, ggplot(.SD, aes(carrier, N, fill = manufacturer)) +
              geom_col() +
              guides(fill = FALSE)]

which gives:

enter image description here

like image 121
Jaap Avatar answered Sep 30 '22 20:09

Jaap