Is it possible in R to search for a regex in a vector as if all the elements are a collapsed single element? If we collapse all the elements into one to do this, it becomes impossible to put them back to their element-wise form after the search.
here is a vector.
vector<-c("I", "met", "a", "cow")
now, the search word is "meta" (elements 2 and 3 collapsed).
Let's say my task is to merge the two elements across which the search string lies.
So what I expect is this:
vector = "I", "meta", "cow"
Is it possible to do this? Please help.
If you'd like something that matches "meta"
but not "taco"
, this will do the trick:
myFun <- function(vector, word) {
D <- "UnLiKeLyStRiNg"
## Construct a string on which you'll perform regex-search
xx <- paste0(paste0(D, vector, collapse=""), D)
## Construct the regex pattern
start <- paste0("(?<=", D, ")")
mid <- paste0(strsplit(word, "")[[1]], collapse=paste0("(", D, ")?"))
end <- paste0("(?=", D, ")")
pat <- paste0(start, mid, end)
## Use it
strsplit(gsub(pat, word, xx, perl=TRUE), D)[[1]][-1]
}
vector <- c("I", "met", "a", "cow")
myFun(vector, "meta")
# [1] "I" "meta" "cow"
myFun(vector, "taco")
# [1] "I" "met" "a" "cow"
myFun(vector, "Imet")
# [1] "Imet" "a" "cow"
myFun(vector, "Ime")
# [1] "I" "met" "a" "cow"
If only complete elements should merged, you could try this approach:
mergeRegExpr <- function(x, pattern) {
str <- paste(x, sep="", collapse="")
## find starting position of each word
wordStart <- head(cumsum(c(1, nchar(x))), -1)
## look for pattern
rx <- regexpr(pattern=pattern, text=str, fixed=TRUE)
## pos of matching pattern == rx+nchar(pattern)-1
rxEnd <- rx+attr(rx, "match.length")-1
## which vector elements doesn't match pattern
sel <- wordStart < rx | wordStart > rxEnd
## insert merged elements
return(append(x[sel], paste(x[!sel], collapse=""), rx-1))
}
vector <- c("I", "met", "a", "cow")
mergeRegExpr(vector, "meta")
# "I" "meta" "cow"
mergeRegExpr(vector, "acow")
# "I" "met" "acow"
mergeRegExpr(vector, "Imeta")
# "Imeta" "cow"
## partial matching doesn't work
mergeRegExpr(vector, "taco")
# "I" "metacow"
Building on Carl Witthoft's comment, my solution was not with regex, but with basic matching:
# A slightly longer vector
v = c("I", "met", "a", "cow", "today",
"You", "met", "a", "cow", "today")
# Create the combinations of each pair
temp1 = sapply(1:(length(v)-1),
function(x) paste0(v[x], v[x+1]))
# Grab the index of the desired search term
temp2 = which(temp1 %in% "meta")
# The following also works.
# Don't know what's faster/better.
# temp2 = grep("meta", temp1)
# Do some manual substitution and deletion
v[temp2] <- "meta"
v <- v[-(temp2+1)]
I don't think this is an ideal situation at all though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With