I have a sparse data table that looks like this:
data = data.table(
var1 = c("a","",""),
var2 = c("","","c"),
var3 = c("a","b",""),
var4 = c("","b","")
)
var1 var2 var3 var4
1: a a
2: b b
3: c
I would like to add a column that contains a string of zeros and ones indicating which variables are present in any row, like this:
var1 var2 var3 var4 concat
1: a a 1|0|1|0
2: b b 0|0|1|1
3: c 0|1|0|0
I can get to this with the following command:
data[, concat := paste(
as.integer(var1 != ""),
as.integer(var2 != ""),
as.integer(var3 != ""),
as.integer(var4 != ""),
sep = "|")]
However, if I have hundreds of variables, I would rather use some sort of computation to get to the desired expression. Perhaps something based on paste0("var",1:4)
, or at least a vector of column names. Any suggestions?
Go to the Data tab > Data Tools group, click the What-If Analysis button, and then click Data Table… In the Data Table dialog window, click in the Column Input cell box (because our Investment values are in a column), and select the variable cell referenced in your formula.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
By using the same cbin() function you can add multiple columns to the DataFrame in R.
Same basic approach as above:
data[ , concat := apply(.SD, 1, function(x) paste(+(x == ""), collapse = "|"))][]
# var1 var2 var3 var4 concat
# 1: a a 0|1|0|1
# 2: b b 1|1|0|0
# 3: c 1|0|1|1
Variation not requiring any grouping by each row or apply
-ing on each row.
data[, concat := do.call(paste, c(lapply(.SD, function(x) (x!="")+0 ), sep="|")) ]
# var1 var2 var3 var4 concat
#1: a a 1|0|1|0
#2: b b 0|0|1|1
#3: c 0|1|0|0
data$concat <- apply(apply(data, 2, function(x) ifelse(x == "", 0, 1)), 1, function(x) paste(x, collapse="|"))
Breakdown:
1) For each column in data
, check if element is empty, if so return 0, else 1
apply(data, 2, function(x) ifelse(x == "", 0, 1))
Let's call the return from (1) the variable concat
. For each row of concat
, paste everything together with a pipe (|
) separating them. Set the new column of data
to equal this.
apply(concat, 1, function(x) paste(x, collapse="|"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With