How can I fit multiple models by group using data.table syntax? I want my output to be a data.frame with columns for each "by group" and one column for each model fit. Currently I am able to do this using the dplyr package, but can't do this in data.table.
# example data frame
df <- data.table(
id = sample(c("id01", "id02", "id03"), N, TRUE),
v1 = sample(5, N, TRUE),
v2 = sample(round(runif(100, max = 100), 4), N, TRUE)
)
# equivalent code in dplyr
group_by(df, id) %>%
do( model1= lm(v1 ~v2, .),
model2= lm(v2 ~v1, .)
)
# attempt in data.table
df[, .(model1 = lm(v1~v2, .SD), model2 = lm(v2~v1, .SD) ), by = id ]
# Brodie G's solution
df[, .(model1 = list(lm(v1~v2, .SD)), model2 = list(lm(v2~v1, .SD))), by = id ]
Data. table is an extension of data. frame package in R. It is widely used for fast aggregation of large datasets, low latency add/update/remove of columns, quicker ordered joins, and a fast file reader.
SD stands for "Subset of Data. table". The dot before SD has no significance but doesn't let it clash with a user-defined column name.
It provides the efficient data. table object which is a much improved version of the default data. frame . It is super fast and has intuitive and terse syntax.
n: The number of observations in the current group. This function is implemented specifically for each data source and can only be used from within summarise() , mutate() and filter() .
Try:
df[, .(model1 = list(lm(v1~v2, .SD)), model2 = list(lm(v2~v1, .SD))), by = id ]
or slightly more idiomatically:
formulas <- list(v1~v2, v2~v1)
df[, lapply(formulas, function(x) list(lm(x, data=.SD))), by=id]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With