Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtract two strings from each other

Tags:

r

I have the following input

#mydata

ID  variable1  variable2
1    a,b,c,d      c,a 
2    g,f,h        h
3    p,l,m,n,c    c,l

I wish to subtract the strings of varible2 from variable1 and I'd like to have the following output?

#Output
ID  Output 
1    b,d      
2    g,f        
3    p,m,n    

#dput

structure(list(ID = 1:3, variable1 = structure(1:3, .Label = c("a,b,c,d", 
"g,f,h", "p,l,m,n,c"), class = "factor"), variable2 = structure(c(1L, 
 3L, 2L), .Label = c("c,a", "c,l", "h"), class = "factor")), .Names =    c("ID", 
 "variable1", "variable2"), class = "data.frame", row.names = c(NA, 
-3L))
like image 854
MFR Avatar asked Aug 04 '16 07:08

MFR


2 Answers

You can try,

Map(setdiff, strsplit(as.character(df$variable1), ',')), strsplit(as.character(df$variable2), ','))
like image 54
Sotos Avatar answered Oct 21 '22 19:10

Sotos


We can use Map after splitting each of the columns by , get the setdiff, paste them together, set the names of the list output with 'ID' column, stack it to 'data.frame' and set the names to 'ID' and 'Output' for the columns.

setNames(stack(setNames(Map(function(x,y) toString(setdiff(x,y)), 
         strsplit(as.character(df1$variable1), ","), 
         strsplit(as.character(df1$variable2), ",")),
              df1$ID))[2:1], c("ID", "Output"))
 #  ID  Output
 #1  1    b, d
 #2  2    g, f
 #3  3 p, m, n

Or a compact option would be

library(splitstackshape)
cSplit(df1, 2:3, ",", "long")[, .(Output = toString(setdiff(variable1, variable2))) , ID]
#   ID  Output
#1:  1    b, d
#2:  2    g, f
#3:  3 p, m, n
like image 28
akrun Avatar answered Oct 21 '22 21:10

akrun