I have a sparse data table that looks like this: <pre class="prettyprint"><code> data = data.table( var1 = c("a","",""), var2 = c("","","c"), var3 = c("a","b",""), var4 = c("","b","") ) var1 var2 var3 var4 1: a a 2: b b 3: c </code></pre> I would like to add a column that contains a string of zeros and ones indicating which variables are present in any row, like this: <pre class="prettyprint"><code> var1 var2 var3 var4 concat 1: a a 1|0|1|0 2: b b 0|0|1|1 3: c 0|1|0|0 </code></pre> I can get to this with the following command: <pre class="prettyprint"><code>data[, concat := paste( as.integer(var1 != ""), as.integer(var2 != ""), as.integer(var3 != ""), as.integer(var4 != ""), sep = "|")] </code></pre> However, if I have hundreds of variables, I would rather use some sort of computation to get to the desired expression. Perhaps something based on <code>paste0("var",1:4)</code>, or at least a vector of column names. Any suggestions?

Variation not requiring any grouping by each row or <code>apply</code>-ing on each row. <pre class="prettyprint"><code>data[, concat := do.call(paste, c(lapply(.SD, function(x) (x!="")+0 ), sep="|")) ] # var1 var2 var3 var4 concat #1: a a 1|0|1|0 #2: b b 0|0|1|1 #3: c 0|1|0|0 </code></pre>

<pre class="prettyprint"><code> data$concat <- apply(apply(data, 2, function(x) ifelse(x == "", 0, 1)), 1, function(x) paste(x, collapse="|")) </code></pre> Breakdown: 1) For each column in <code>data</code>, check if element is empty, if so return 0, else 1 <pre class="prettyprint"><code>apply(data, 2, function(x) ifelse(x == "", 0, 1)) </code></pre> Let's call the return from (1) the variable <code>concat</code>. For each row of <code>concat</code>, paste everything together with a pipe (<code>|</code>) separating them. Set the new column of <code>data</code> to equal this. <pre class="prettyprint"><code>apply(concat, 1, function(x) paste(x, collapse="|")) </code></pre>

Computing on multiple column names in a data.table

Tags:

r

data.table

I have a sparse data table that looks like this:

 data = data.table(
    var1 = c("a","",""),
    var2 = c("","","c"),
    var3 = c("a","b",""),
    var4 = c("","b","")
)
      var1 var2 var3 var4
    1:    a         a     
    2:              b    b
    3:         c

I would like to add a column that contains a string of zeros and ones indicating which variables are present in any row, like this:

  var1 var2 var3 var4  concat
1:    a         a      1|0|1|0
2:              b    b 0|0|1|1
3:         c           0|1|0|0

I can get to this with the following command:

data[, concat := paste(
           as.integer(var1 != ""),
           as.integer(var2 != ""),
           as.integer(var3 != ""),
           as.integer(var4 != ""),
           sep = "|")]

However, if I have hundreds of variables, I would rather use some sort of computation to get to the desired expression. Perhaps something based on paste0("var",1:4), or at least a vector of column names. Any suggestions?

897

asked Aug 11 '16 23:08

BigFinger

3 Answers

Same basic approach as above:

data[ , concat := apply(.SD, 1, function(x) paste(+(x == ""), collapse = "|"))][]
#    var1 var2 var3 var4  concat
# 1:    a         a      0|1|0|1
# 2:              b    b 1|1|0|0
# 3:         c           1|0|1|1

144

answered Sep 21 '22 14:09

MichaelChirico

Variation not requiring any grouping by each row or apply-ing on each row.

data[, concat := do.call(paste, c(lapply(.SD, function(x) (x!="")+0 ), sep="|")) ]

#   var1 var2 var3 var4  concat
#1:    a         a      1|0|1|0
#2:              b    b 0|0|1|1
#3:         c           0|1|0|0

answered Sep 24 '22 14:09

thelatemail

 data$concat <- apply(apply(data, 2, function(x) ifelse(x == "", 0, 1)), 1, function(x) paste(x, collapse="|"))

Breakdown:

1) For each column in data, check if element is empty, if so return 0, else 1

apply(data, 2, function(x) ifelse(x == "", 0, 1))

Let's call the return from (1) the variable concat. For each row of concat, paste everything together with a pipe (|) separating them. Set the new column of data to equal this.

apply(concat, 1, function(x) paste(x, collapse="|"))

answered Sep 21 '22 14:09

TomNash

Related questions
                            
                                How do you paste list of items in R
                            
                                Create list of functions without eval/parse
                            
                                Using R in Python with Rpy2: how to ggplot2?
                            
                                Check if Posixct time is within interval
                            
                                melt a data.table with a column pattern
                            
                                Find first sequence of length n in R
                            
                                Transpose only certain columns in data.frame
                            
                                Multiple regression leave out one variable (column)
                            
                                R: change column order in data.table for only some columns
                            
                                Web scraping into R multiple links with similar URL using a for loop or lapply
                            
                                is.numeric within apply vs sapply
                            
                                Split string and transpose result
                            
                                Fill gradient color not working with geom_bar of ggplot2
                            
                                All possible permutations of decimal numbers (hundredths) that sum up to 1 for a given length
                            
                                How to evaluate a string to filter an R data.table?
                            
                                Twitter emoji encoding problems with twitteR and R
                            
                                R, Leaflet polygons add black borders
                            
                                Adding vector elements with condition in R
                            
                                Subtract two strings from each other
                            
                                Schedule a Rscript crontab everyminute

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With