Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing on multiple column names in a data.table

Tags:

r

data.table

I have a sparse data table that looks like this:

 data = data.table(
    var1 = c("a","",""),
    var2 = c("","","c"),
    var3 = c("a","b",""),
    var4 = c("","b","")
)
      var1 var2 var3 var4
    1:    a         a     
    2:              b    b
    3:         c     

I would like to add a column that contains a string of zeros and ones indicating which variables are present in any row, like this:

  var1 var2 var3 var4  concat
1:    a         a      1|0|1|0
2:              b    b 0|0|1|1
3:         c           0|1|0|0

I can get to this with the following command:

data[, concat := paste(
           as.integer(var1 != ""),
           as.integer(var2 != ""),
           as.integer(var3 != ""),
           as.integer(var4 != ""),
           sep = "|")]

However, if I have hundreds of variables, I would rather use some sort of computation to get to the desired expression. Perhaps something based on paste0("var",1:4), or at least a vector of column names. Any suggestions?

like image 897
BigFinger Avatar asked Aug 11 '16 23:08

BigFinger


People also ask

How do you create a multi variable data table in Excel?

Go to the Data tab > Data Tools group, click the What-If Analysis button, and then click Data Table… In the Data Table dialog window, click in the Column Input cell box (because our Investment values are in a column), and select the variable cell referenced in your formula.

How do I select multiple columns in a Dataframe in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.

How do I add multiple columns in R?

By using the same cbin() function you can add multiple columns to the DataFrame in R.


3 Answers

Same basic approach as above:

data[ , concat := apply(.SD, 1, function(x) paste(+(x == ""), collapse = "|"))][]
#    var1 var2 var3 var4  concat
# 1:    a         a      0|1|0|1
# 2:              b    b 1|1|0|0
# 3:         c           1|0|1|1
like image 144
MichaelChirico Avatar answered Sep 21 '22 14:09

MichaelChirico


Variation not requiring any grouping by each row or apply-ing on each row.

data[, concat := do.call(paste, c(lapply(.SD, function(x) (x!="")+0 ), sep="|")) ]

#   var1 var2 var3 var4  concat
#1:    a         a      1|0|1|0
#2:              b    b 0|0|1|1
#3:         c           0|1|0|0
like image 40
thelatemail Avatar answered Sep 24 '22 14:09

thelatemail


 data$concat <- apply(apply(data, 2, function(x) ifelse(x == "", 0, 1)), 1, function(x) paste(x, collapse="|"))

Breakdown:

1) For each column in data, check if element is empty, if so return 0, else 1

apply(data, 2, function(x) ifelse(x == "", 0, 1))

Let's call the return from (1) the variable concat. For each row of concat, paste everything together with a pipe (|) separating them. Set the new column of data to equal this.

apply(concat, 1, function(x) paste(x, collapse="|"))
like image 32
TomNash Avatar answered Sep 21 '22 14:09

TomNash