Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting string column in datatable using positions from another column

I have a data table which contains multiple columns of the following type:

   attr1 attr2
1: 01001 01000
2: 11000 10000
3: 00100 00100
4: 01100 01000

DT = setDT(structure(list(attr1 = c("01001", "11000", "00100", "01100"), 
    attr2 = c("01000", "10000", "00100", "01000")), .Names = c("attr1", 
"attr2"), row.names = c(NA, -4L), class = "data.frame"))

All of the columns are strings not numbers. What I would like to achieve is the following:

1) I want to find the positions that the "1" appears in the strings of attr1

2) Take the values of attr2 in these positions

My result in this case would be:

[1] "10" "10" "1"  "10"

As an example in the first row, attr1 has "1" in positions 2 and 5, I subset the first row of attr2 in positions 2 and 5 and end up with "10".

What I have thought to do is to strsplit the columns and then work with that but I really hope there is a better way.

like image 612
User2321 Avatar asked Nov 30 '22 15:11

User2321


2 Answers

You can use a variation on @alistaire's regmatches answer, as there is also a replacement function regmatches<-. So, instead of extracting 1 values, replace 0 values with "":

dt[, matches := `regmatches<-`(attr2, gregexpr("0+", attr1), value="")]

#   attr1 attr2 matches
#1: 01001 01000      10
#2: 11000 10000      10
#3: 00100 00100       1
#4: 01100 01000      10

Your idea to strsplit and compare is also feasible:

dt[, matches := mapply(function(x,y) paste(y[x==1],collapse=""), strsplit(attr1,""), strsplit(attr2,""))]
like image 86
thelatemail Avatar answered Dec 05 '22 11:12

thelatemail


You can use base R's regmatches to supply a different string for matching and replacement:

dt[, matches := sapply(regmatches(attr2, gregexpr('1+', attr1)), paste, collapse = '')][]
#>    attr1 attr2 matches
#> 1: 01001 01000      10
#> 2: 11000 10000      10
#> 3: 00100 00100       1
#> 4: 01100 01000      10

Data

dt <- structure(list(attr1 = c("01001", "11000", "00100", "01100"), 
        attr2 = c("01000", "10000", "00100", "01000")), .Names = c("attr1", 
    "attr2"), row.names = c(NA, -4L), class = "data.frame")

setDT(dt)
like image 33
alistaire Avatar answered Dec 05 '22 11:12

alistaire