I have a dataframe:
df <- data.frame(name=c("john", "david", "callum", "joanna", "allison", "slocum", "lisa"), id=1:7)
df
name id
1 john 1
2 david 2
3 callum 3
4 joanna 4
5 allison 5
6 slocum 6
7 lisa 7
I have a vector containing regex that I wish to find in the df$name variable:
vec <- c("lis", "^jo", "um$")
The output I want to get is as follows:
name id group
1 john 1 2
2 david 2 NA
3 callum 3 3
4 joanna 4 2
5 allison 5 1
6 slocum 6 3
7 lisa 7 1
I could do this doing the following:
df$group <- ifelse(grepl("lis", df$name), 1,
ifelse(grepl("^jo", df$name), 2,
ifelse(grepl("um$", df$name), 3,
NA)
However, I want to do this directly from 'vec'. I am generating different values into vec reactively in a shiny app. Can I assign groups based on index in vec?
Further, if something like the below happens, the group should be the first appearing. e.g. 'Callum' is TRUE for 'all' and "um$" but should get a group 1 here.
vec <- c("all", "^jo", "um$")
Here are several options:
df$group <- apply(Vectorize(grepl, "pattern")(vec, df$name),
1,
function(ii) which(ii)[1])
# name id group
#1 john 1 2
#2 david 2 NA
#3 callum 3 3
#4 joanna 4 2
#5 allison 5 1
#6 slocum 6 3
#7 lisa 7 1
Use a named vector and merge on it:
names(vec) <- seq_along(vec)
df <- merge(df, stack(Vectorize(grep, "pattern", SIMPLIFY=FALSE)(vec, df$name)),
by.x="id", by.y="values", all.x = TRUE)
df[!duplicated(df$id),] # to keep only the first match
# id name ind
#1 1 john 2
#2 2 david <NA>
#3 3 callum 3
#4 4 joanna 2
#5 5 allison 1
#6 6 slocum 3
#7 7 lisa 1
A for loop:
df$group <- NA
for ( i in rev(seq_along(vec))) {
TFvec <- grepl(vec[i], df$name)
df$group[TFvec] <- i
}
df
# name id group
#1 john 1 2
#2 david 2 NA
#3 callum 3 3
#4 joanna 4 2
#5 allison 5 1
#6 slocum 6 3
#7 lisa 7 1
Or you can use outer
with stri_match_first_regex
from stringi
library(stringi)
match.mat <- outer(df$name, vec, stri_match_first_regex)
df$group <- apply(match.mat, 1, function(ii) which(!is.na(ii))[1])
# [1] for first match in `vec`
# name id group
#1 john 1 2
#2 david 2 NA
#3 callum 3 3
#4 joanna 4 2
#5 allison 5 1
#6 slocum 6 3
#7 lisa 7 1
A vectorised solution, using rebus
and stringi
.
library(rebus)
library(stringi)
Create a regular expression that captures any of the values in vec
.
vec <- c("lis", "^jo", "um$")
(rx <- or1(vec, capture = TRUE))
## <regex> (lis|^jo|um$)
Match the regex, then convert to factor and integer.
matches <- stri_match_first_regex(df$name, rx)[, 2]
df$group <- as.integer(factor(matches, levels = c("lis", "jo", "um")))
df
now looks like this:
name id group
1 john 1 2
2 david 2 NA
3 callum 3 3
4 joanna 4 2
5 allison 5 1
6 slocum 6 3
7 lisa 7 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With