Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset not based on exact match, but partial in R

Tags:

substring

regex

r

This is a follow-up question from here: Subsetting a string based on pre- and suffix

When you have this command:

    d <- subset(b, b$X %in% test)  

This command look for all in b$X that exactly matches test. How can I manipulate it to say its enough that the values in b$X contains test?
I.e. if b$X has a value "something" and test has "thing". Then I would regard this as a match.

Important update! Test has 512 values, not only 1 as in the example.

like image 224
user3236594 Avatar asked Jan 28 '14 14:01

user3236594


1 Answers

You can replace %in% with grepl:

# examples
x <- c("thing", "something", "some", "else")
test <- c("thing", "some")

# exact match
x %in% test
# [1]  TRUE FALSE  TRUE FALSE

# substring match (regex)
pattern <- paste(test, collapse = "|") # create regex pattern
grepl(pattern, x)
# [1]  TRUE  TRUE  TRUE FALSE

The whole command for your task:

d <- subset(b, grepl(paste(test, collapse= "|"), b$X))

The "|" means logical or in regular expressions.

like image 53
Sven Hohenstein Avatar answered Nov 15 '22 07:11

Sven Hohenstein