I need to check if a string in one column contains a corresponding (numeric) value from the same row of another column, for all rows.
If I were only checking the string for a single pattern this would be straightforward using data.table's like
or grepl
. However, my pattern value is different for every row.
There's a somewhat related question here, but unlike that question I need to create a logical flag indicating if the pattern was present.
Let's say this is my dataset;
DT <- structure(list(category = c("administration", "nurse practitioner",
"trucking", "administration", "warehousing", "warehousing", "trucking",
"nurse practitioner", "nurse practitioner"), industry = c("admin",
"truck", "truck", "admin", "nurse", "admin", "truck", "nurse",
"truck")), .Names = c("category", "industry"), class = "data.frame", row.names = c(NA,
-9L))
setDT(DT)
> DT
category industry
1: administration admin
2: nurse practitioner truck
3: trucking truck
4: administration admin
5: warehousing nurse
6: warehousing admin
7: trucking truck
8: nurse practitioner nurse
9: nurse practitioner truck
My desired result would be a vector like this:
> DT
matches
1: TRUE
2: FALSE
3: TRUE
4: TRUE
5: FALSE
6: FALSE
7: TRUE
8: TRUE
9: FALSE
Of course, 1's and 0's would be just as good as TRUE and FALSE.
Here are some things I tried that didn't work:
apply(DT,1,grepl, pattern = DT[,2], x = DT[,1])
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> apply(DT,1,grepl, pattern = DT[,1], x = DT[,2])
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> grepl(DT[,2], DT[,1])
[1] FALSE
> DT[Vectorize(grepl)(industry, category, fixed = TRUE)]
category industry
1: administration admin
2: trucking truck
3: administration admin
4: trucking truck
5: nurse practitioner nurse
> DT[stringi::stri_detect_fixed(category, industry)]
category industry
1: administration admin
2: trucking truck
3: administration admin
4: trucking truck
5: nurse practitioner nurse
> for(i in 1:nrow(DT)){print(grepl(DT[i,2], DT[i,1]))}
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
> for(i in 1:nrow(DT)){print(grepl(DT[i,2], DT[i,1], fixed = T))}
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
> DT[category %like% industry]
category industry
1: administration admin
2: administration admin
Warning message:
In grepl(pattern, vector) :
argument 'pattern' has length > 1 and only the first element will be used
In the OP's code, the ,
was not used. So, based on the data.table
method, it will subset the rows that corresponds to the i
index.
But, if we are specifying the ,
we are playing with the j
and we get the logical vector as a result
DT[, stri_detect_fixed(category, industry)]
#[1] TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE
Suppose, we keep it in a list
, then we get a data.table
with a column
DT[, list(match=stri_detect_fixed(category, industry))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With