Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

constructing an identifier string for each row in data

I have the following data:

library(data.table)
d = data.table(a = c(1:3), b = c(2:4))

and would like to get this result (in a way that would work with arbitrary number of columns):

d[, c := paste0('a_', a, '_b_', b)]
d
#   a b       c
#1: 1 2 a_1_b_2
#2: 2 3 a_2_b_3
#3: 3 4 a_3_b_4

The following works, but I'm hoping to find something shorter and more legible.

d = data.table(a = c(1:3), b = c(2:4))
d[, c := apply(mapply(paste, names(.SD), .SD, MoreArgs = list(sep = "_")),
               1, paste, collapse = "_")]
like image 549
eddi Avatar asked Jul 02 '13 19:07

eddi


People also ask

How to assign a unique ID number to each group in R?

Next, we can use the group_by and mutate functions of the dplyr package to assign a unique ID number to each group of identical values in a column (i.e. x1): By running the previous R programming syntax we have created the tibble shown in the previous RStudio console output.

How do I create a unique ID column by group using dplyr?

In this example, I’ll demonstrate how to create a unique ID column by group using the dplyr package. First, we need to install and load the dplyr package: Next, we can use the group_by and mutate functions of the dplyr package to assign a unique ID number to each group of identical values in a column (i.e. x1):

How to extract each row of a Dataframe into a list?

Now we will use the DataFrame.iterrows () function to iterate over each of the row of the given Dataframe and construct a list out of the data of each row. As we can see in the output, we have successfully extracted each row of the given dataframe into a list.

How to iterate over each row of a Dataframe?

And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course Now we will use the DataFrame.iterrows () function to iterate over each of the row of the given Dataframe and construct a list out of the data of each row.


2 Answers

one way, only slightly cleaner:

d[, c :=  apply(d, 1, function(x) paste(names(d), x, sep="_", collapse="_")) ]

     a b       c
1: 1 2 a_1_b_2
2: 2 3 a_2_b_3
3: 3 4 a_3_b_4
like image 161
Ricardo Saporta Avatar answered Oct 02 '22 11:10

Ricardo Saporta


Here is an approach using do.call('paste'), but requiring only a single call to paste

I will benchmark on a situtation where the columns are integers (as this seems a more sensible test case

N <- 1e4

d <- setnames(as.data.table(replicate(5, sample(N), simplify = FALSE)), letters[seq_len(5)])

f5 <- function(d){
  l <- length(d)
  o <- c(1L, l + 1L) + rep_len(seq_len(l) -1L, 2L * l)
  do.call('paste',c((c(as.list(names(d)),d))[o],sep='_'))}


microbenchmark(f1(d), f2(d),f5(d))
Unit: milliseconds
  expr       min        lq    median        uq       max neval
 f1(d)  41.51040  43.88348  44.60718  45.29426  52.83682   100
 f2(d) 193.94656 207.20362 210.88062 216.31977 252.11668   100
 f5(d)  30.73359  31.80593  32.09787  32.64103  45.68245   100
like image 35
mnel Avatar answered Oct 02 '22 10:10

mnel