I have the following data: <pre class="prettyprint"><code>library(data.table) d = data.table(a = c(1:3), b = c(2:4)) </code></pre> and would like to get this result (in a way that would work with arbitrary number of columns): <pre class="prettyprint"><code>d[, c := paste0('a_', a, '_b_', b)] d # a b c #1: 1 2 a_1_b_2 #2: 2 3 a_2_b_3 #3: 3 4 a_3_b_4 </code></pre> The following works, but I'm hoping to find something shorter and more legible. <pre class="prettyprint"><code>d = data.table(a = c(1:3), b = c(2:4)) d[, c := apply(mapply(paste, names(.SD), .SD, MoreArgs = list(sep = "_")), 1, paste, collapse = "_")] </code></pre>

one way, only slightly cleaner: <pre class="prettyprint"><code>d[, c := apply(d, 1, function(x) paste(names(d), x, sep="_", collapse="_")) ] a b c 1: 1 2 a_1_b_2 2: 2 3 a_2_b_3 3: 3 4 a_3_b_4 </code></pre>

Here is an approach using <code>do.call('paste')</code>, but requiring only a single call to <code>paste</code> I will benchmark on a situtation where the columns are integers (as this seems a more sensible test case <pre class="prettyprint"><code>N <- 1e4 d <- setnames(as.data.table(replicate(5, sample(N), simplify = FALSE)), letters[seq_len(5)]) f5 <- function(d){ l <- length(d) o <- c(1L, l + 1L) + rep_len(seq_len(l) -1L, 2L * l) do.call('paste',c((c(as.list(names(d)),d))[o],sep='_'))} microbenchmark(f1(d), f2(d),f5(d)) Unit: milliseconds expr min lq median uq max neval f1(d) 41.51040 43.88348 44.60718 45.29426 52.83682 100 f2(d) 193.94656 207.20362 210.88062 216.31977 252.11668 100 f5(d) 30.73359 31.80593 32.09787 32.64103 45.68245 100 </code></pre>

constructing an identifier string for each row in data

Tags:

string

r

data.table

I have the following data:

library(data.table)
d = data.table(a = c(1:3), b = c(2:4))

and would like to get this result (in a way that would work with arbitrary number of columns):

d[, c := paste0('a_', a, '_b_', b)]
d
#   a b       c
#1: 1 2 a_1_b_2
#2: 2 3 a_2_b_3
#3: 3 4 a_3_b_4

The following works, but I'm hoping to find something shorter and more legible.

d = data.table(a = c(1:3), b = c(2:4))
d[, c := apply(mapply(paste, names(.SD), .SD, MoreArgs = list(sep = "_")),
               1, paste, collapse = "_")]

549

asked Jul 02 '13 19:07

eddi

2 Answers

one way, only slightly cleaner:

d[, c :=  apply(d, 1, function(x) paste(names(d), x, sep="_", collapse="_")) ]

     a b       c
1: 1 2 a_1_b_2
2: 2 3 a_2_b_3
3: 3 4 a_3_b_4

161

answered Oct 02 '22 11:10

Ricardo Saporta

Here is an approach using do.call('paste'), but requiring only a single call to paste

I will benchmark on a situtation where the columns are integers (as this seems a more sensible test case

N <- 1e4

d <- setnames(as.data.table(replicate(5, sample(N), simplify = FALSE)), letters[seq_len(5)])

f5 <- function(d){
  l <- length(d)
  o <- c(1L, l + 1L) + rep_len(seq_len(l) -1L, 2L * l)
  do.call('paste',c((c(as.list(names(d)),d))[o],sep='_'))}


microbenchmark(f1(d), f2(d),f5(d))
Unit: milliseconds
  expr       min        lq    median        uq       max neval
 f1(d)  41.51040  43.88348  44.60718  45.29426  52.83682   100
 f2(d) 193.94656 207.20362 210.88062 216.31977 252.11668   100
 f5(d)  30.73359  31.80593  32.09787  32.64103  45.68245   100

answered Oct 02 '22 10:10

mnel

Related questions
                            
                                How to make a random but partial shuffle in Python?
                            
                                How can I detect onclick() or similar for individual characters in a text?
                            
                                Convert strings specified by length (not NUL-terminated) to int/float? [duplicate]
                            
                                How do I add a line break at the mid point of a string split by whitespace
                            
                                Valgrind: "Invalid read" with c_str and strtod
                            
                                java System.out.println() strange behavior long string
                            
                                How does SequenceMatcher.ratio works in difflib
                            
                                How can I feed a ISO4217 Currency Code to a NumberFormat?
                            
                                Linq query a string array in c# if contains either of two values?
                            
                                In C++, is the amortized complexity of std::string::push_back() O(1)?
                            
                                How do I know whether a character to a given language? In Unicode string [duplicate]
                            
                                Get current word on caret position
                            
                                C++ string-like class with implicit conversion
                            
                                Returning wrong MD5 hash in C
                            
                                split complex string
                            
                                String Creation and char array Memory Allocation
                            
                                Why is split(' ') trying to be (too) smart?
                            
                                libcurl get JSON string
                            
                                Creating string with escape java
                            
                                Best way to compare two large sets of strings in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With