Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table alternative for dplyr mutate?

Tags:

r

data.table

I'm learning R and I'm not sure if it makes sense to standardise on dplyr or data.table. Dplyr has really nice syntax, but as far as I understand it copies data frame on each operation, which is (or could be) a drawback.

One thing that I can't figure out is alternative for mutate.

if I have

df %>% group_by(foo) %>% mutate(
    bar  = cumsum(baz),
    q    = bar * 3.14)

I could do sth like

df[,c("bar"):=list(cumsum(baz)),by=foo]
df$q <- df$bar*3.14

Is there a better way of doing this in data.table?

like image 934
zapp0 Avatar asked Apr 11 '15 22:04

zapp0


People also ask

Is data table better than dplyr?

Memory Usage (Efficiency)data. table is the most efficient when filtering rows. dplyr is far more efficient when summarizing by group while data. table was the least efficient.

Does dplyr work with data table?

Each dplyr verb must do some work to convert dplyr syntax to data. table syntax. This takes time proportional to the complexity of the input code, not the input data, so should be a negligible overhead for large datasets.

Is data table faster than Tidyverse?

The tidyverse, for example, emphasizes readability and flexibility, which is great when I need to write scaleable code that others can easily read. data. table, on the other hand, is lightening fast and very concise, so you can develop quickly and run super fast code, even when datasets get fairly large.

Is dplyr faster than base R?

In my benchmarking project, Base R sorts a dataset much faster than dplyr or data.


1 Answers

You may do just this:

# some test data:
df <- data.table(baz = 1:10, foo = c(rep(1, 5), rep(2, 5)))

df[, bar := cumsum(baz), by = foo]
df[, q := bar*3.14]

While being in two lines, it is very readable and easy to write.

like image 176
sthelen Avatar answered Oct 03 '22 22:10

sthelen