I have the following data:
library(data.table)
d = data.table(a = c(1:3), b = c(2:4))
and would like to get this result (in a way that would work with arbitrary number of columns):
d[, c := paste0('a_', a, '_b_', b)]
d
# a b c
#1: 1 2 a_1_b_2
#2: 2 3 a_2_b_3
#3: 3 4 a_3_b_4
The following works, but I'm hoping to find something shorter and more legible.
d = data.table(a = c(1:3), b = c(2:4))
d[, c := apply(mapply(paste, names(.SD), .SD, MoreArgs = list(sep = "_")),
1, paste, collapse = "_")]
Next, we can use the group_by and mutate functions of the dplyr package to assign a unique ID number to each group of identical values in a column (i.e. x1): By running the previous R programming syntax we have created the tibble shown in the previous RStudio console output.
In this example, I’ll demonstrate how to create a unique ID column by group using the dplyr package. First, we need to install and load the dplyr package: Next, we can use the group_by and mutate functions of the dplyr package to assign a unique ID number to each group of identical values in a column (i.e. x1):
Now we will use the DataFrame.iterrows () function to iterate over each of the row of the given Dataframe and construct a list out of the data of each row. As we can see in the output, we have successfully extracted each row of the given dataframe into a list.
And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course Now we will use the DataFrame.iterrows () function to iterate over each of the row of the given Dataframe and construct a list out of the data of each row.
one way, only slightly cleaner:
d[, c := apply(d, 1, function(x) paste(names(d), x, sep="_", collapse="_")) ]
a b c
1: 1 2 a_1_b_2
2: 2 3 a_2_b_3
3: 3 4 a_3_b_4
Here is an approach using do.call('paste')
, but requiring only a single call to paste
I will benchmark on a situtation where the columns are integers (as this seems a more sensible test case
N <- 1e4
d <- setnames(as.data.table(replicate(5, sample(N), simplify = FALSE)), letters[seq_len(5)])
f5 <- function(d){
l <- length(d)
o <- c(1L, l + 1L) + rep_len(seq_len(l) -1L, 2L * l)
do.call('paste',c((c(as.list(names(d)),d))[o],sep='_'))}
microbenchmark(f1(d), f2(d),f5(d))
Unit: milliseconds
expr min lq median uq max neval
f1(d) 41.51040 43.88348 44.60718 45.29426 52.83682 100
f2(d) 193.94656 207.20362 210.88062 216.31977 252.11668 100
f5(d) 30.73359 31.80593 32.09787 32.64103 45.68245 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With