Say for example I have a data frame with a column a
and I want to create columns a^i
for several values of i
.
> dat <- data.frame(a=1:5)
> dat
a
1 1
2 2
3 3
4 4
5 5
As an example, the output I want for i=2:5
:
a power_2 power_3 power_4 power_5
1 1 1 1 1 1
2 2 4 8 16 32
3 3 9 27 81 243
4 4 16 64 256 1024
5 5 25 125 625 3125
Currently I get this output with data.table
as follows:
DT <- data.table(dat)
exponents <- 2:5
DT[, paste0("power_",exponents):=lapply(exponents, function(p) a^p)]
How to do with plyr
/dplyr
? Of course I could do as below by typing power_i=a^i
for each i
but this is not what I want.
mutate(dat, power_2=a^2, power_3=a^3, ...)
Several answers have been proposed, and have been compared by @docendo discimus. I'm just adding the comparison with data.table
.
library(data.table)
library(dplyr)
set.seed(2015)
dat <- data.frame(a = sample(1000))
i <- 2:5
n <- c(names(dat), paste0("power_", i))
DT <- data.table(dat)
library(microbenchmark)
microbenchmark(
data.table = DT[, paste0("power_",i):=lapply(i, function(k) a^k)],
Henrik = dat %>% do(data.frame(., outer(.$a, i, `^`))) %>% setNames(n),
dd.do = dat %>% do(data.frame(., sapply(i, function(x) .$a^x))) %>% setNames(n),
dd.bc = dat %>% bind_cols(as.data.frame(lapply(i, function(x) .$a^x))) %>% setNames(n),
times = 30,
unit = "relative"
)
Unit: relative
expr min lq mean median uq max neval cld
data.table 1.022945 1.039674 1.108558 1.026319 1.083644 2.370180 30 a
Henrik 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 30 a
dd.do 1.149195 1.160735 1.167672 1.158141 1.150280 1.268279 30 a
dd.bc 14.350034 13.982658 13.737964 13.632361 13.606221 15.866711 30 b
Updated benchmark with two base
solutions, Henrik2 and josh (from his comment), which are fastest:
set.seed(2015)
dat <- data.frame(a = sample(1000))
microbenchmark(
data.table = DT[, paste0("power_",i):=lapply(i, function(k) a^k)],
Henrik = dat %>% do(data.frame(., outer(.$a, i, `^`))) %>% setNames(n),
Henrik2 = cbind(dat, outer(dat$a, setNames(i, paste0("power_", i)), `^`)),
dd.do = dat %>% do(data.frame(., sapply(i, function(x) .$a^x))) %>% setNames(n),
dd.bc = dat %>% bind_cols(as.data.frame(lapply(i, function(x) .$a^x))) %>% setNames(n),
josh = data.frame(dat, setNames(lapply(2:5, function(X) dat$a^X), paste0("power_", 2:5))),
times = 30,
unit = "relative"
)
# Unit: relative
# expr min lq mean median uq max neval cld
# data.table 1.991613 2.029778 1.982169 1.990417 1.946677 1.694030 30 bc
# Henrik 2.026345 2.017179 1.996419 2.003189 2.030176 1.733583 30 bc
# Henrik2 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 30 a
# dd.do 2.356886 2.375713 2.322452 2.348053 2.304826 2.101494 30 c
# dd.bc 37.445491 36.081298 34.791638 34.783854 34.787655 27.832116 30 d
# josh 1.725750 1.699887 1.641290 1.625331 1.637823 1.330598 30 b
One possibility is to use outer
in do
, and then set the names with setNames
i <- 2:5
dat %>%
do(data.frame(., outer(.$a, i, `^`))) %>%
setNames(., c("a", paste0("power_", i)))
# a power_2 power_3 power_4 power_5
# 1 1 1 1 1 1
# 2 2 4 8 16 32
# 3 3 9 27 81 243
# 4 4 16 64 256 1024
# 5 5 25 125 625 3125
If you name the 'power vector' "i" first, you can call cbind
instead of do
and data.frame
, and I see no immediate need for dplyr
functions in this particular case.
cbind(dat, outer(dat$a, setNames(i, paste0("power_", i)), `^`))
# a power_2 power_3 power_4 power_5
# 1 1 1 1 1 1
# 2 2 4 8 16 32
# 3 3 9 27 81 243
# 4 4 16 64 256 1024
# 5 5 25 125 625 3125
The base
, non-do
code is faster for your larger sample data. I also added the base
solution by @Josh O'Brien.
set.seed(2015)
dat <- data.frame(a = sample(1000))
microbenchmark(
data.table = DT[, paste0("power_",i):=lapply(i, function(k) a^k)],
Henrik = dat %>% do(data.frame(., outer(.$a, i, `^`))) %>% setNames(n),
Henrik2 = cbind(dat, outer(dat$a, setNames(i, paste0("power_", i)), `^`)),
dd.do = dat %>% do(data.frame(., sapply(i, function(x) .$a^x))) %>% setNames(n),
dd.bc = dat %>% bind_cols(as.data.frame(lapply(i, function(x) .$a^x))) %>% setNames(n),
josh = data.frame(dat, setNames(lapply(2:5, function(X) dat$a^X), paste0("power_", 2:5))),
times = 30,
unit = "relative"
)
# Unit: relative
# expr min lq mean median uq max neval cld
# data.table 1.991613 2.029778 1.982169 1.990417 1.946677 1.694030 30 bc
# Henrik 2.026345 2.017179 1.996419 2.003189 2.030176 1.733583 30 bc
# Henrik2 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 30 a
# dd.do 2.356886 2.375713 2.322452 2.348053 2.304826 2.101494 30 c
# dd.bc 37.445491 36.081298 34.791638 34.783854 34.787655 27.832116 30 d
# josh 1.725750 1.699887 1.641290 1.625331 1.637823 1.330598 30 b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With