I have a main string that looks like this:
my_main <- "ABCDEFGHIJ"
What I want to do is to sequentially mask at every position with another pattern string:
my_pattern <- "x*x" # the length could be varied from 1 up to length of my_main
Every character that overlap with *
will be kept, other will be replaced with x
.
The final result is a vector of strings that contain these:
xBxDEFGHIJ
AxCxEFGHIJ
ABxDxFGHIJ
ABCxExGHIJ
ABCDxFxHIJ
ABCDExGxIJ
ABCDEFxHxJ
ABCDEFGxIx
Next if the pattern is
my_pattern <- "xx**x"
The result would be:
xxCDxFGHIJ
AxxDExGHIJ
ABxxEFxHIJ
ABCxxFGxIJ
ABCDxxGHxJ
ABCDExxHIx
How can I achieve that?
This might be a little over-complicated, but it's a start:
I'm going to reuse Reduce_frame
from https://stackoverflow.com/a/70945868/3358272.
Reduce_frame <- function(data, expr, init) {
expr <- substitute(expr)
out <- rep(init[1][NA], nrow(data))
for (rn in seq_len(nrow(data))) {
out[rn] <- init <- eval(expr, envir = data[rn,])
}
out
}
From here, let's split the pattern into a frame (for ease of access, if nothing else):
repl <- subset(
data.frame(p = strsplit(my_pattern, "")[[1]], i = seq_len(nchar(my_pattern))),
p != "*")
repl
# p i
# 1 x 1
# 3 x 3
From here, we can do it once with:
tail(Reduce_frame(repl, `substring<-`(init, i, i, p), init = my_main), 1)
# [1] "xBxDEFGHIJ"
Which means we can iterate fairly easily:
sapply(c(0, seq_len(nchar(my_main) - nchar(my_pattern))), function(offset) {
tail(Reduce_frame(transform(repl, i = i + offset),
`substring<-`(init, i, i, p), init = my_main), 1)
})
# [1] "xBxDEFGHIJ" "AxCxEFGHIJ" "ABxDxFGHIJ" "ABCxExGHIJ" "ABCDxFxHIJ" "ABCDExGxIJ" "ABCDEFxHxJ" "ABCDEFGxIx"
To use your second pattern,
my_pattern <- "xx**x"
repl <- transform(...) # from above
## the rest of this code is unchanged from above
sapply(c(0, seq_len(nchar(my_main) - nchar(my_pattern))), function(offset) {
tail(Reduce_frame(transform(repl, i = i + offset),
`substring<-`(init, i, i, p), init = my_main), 1)
})
# [1] "xxCDxFGHIJ" "AxxDExGHIJ" "ABxxEFxHIJ" "ABCxxFGxIJ" "ABCDxxGHxJ" "ABCDExxHIx"
So this can be easily functionized:
Reduce_frame <- ... # defined above
func <- function(S, pattern) {
stopifnot(nchar(S) >= nchar(pattern))
repl <- subset(
data.frame(p = strsplit(pattern, "")[[1]], i = seq_len(nchar(pattern))),
p != "*")
sapply(c(0, seq_len(nchar(S) - nchar(pattern))), function(offset) {
tail(Reduce_frame(transform(repl, i = i + offset),
`substring<-`(init, i, i, p), init = S), 1)
})
}
func("ABCDEFGHIJ", "x*x")
# [1] "xBxDEFGHIJ" "AxCxEFGHIJ" "ABxDxFGHIJ" "ABCxExGHIJ" "ABCDxFxHIJ" "ABCDExGxIJ" "ABCDEFxHxJ" "ABCDEFGxIx"
func("ABCDEFGHIJ", "xx**x")
# [1] "xxCDxFGHIJ" "AxxDExGHIJ" "ABxxEFxHIJ" "ABCxxFGxIJ" "ABCDxxGHxJ" "ABCDExxHIx"
Here's one way using strsplit
, grepl
, and paste
.
f <- \(mm, mp) {
m <- el(strsplit(mm, ''))
p <- el(strsplit(mp, ''))
i <- which(!grepl(p, pattern='\\*'))
vapply(c(0L, seq_len(length(m) - max(i))), \(j) {
m[i + j] <- p[i]
paste(m, collapse='')
}, vector('character', 1L))
}
f('ABCDEFGHIJ', 'x*x')
# [1] "xBxDEFGHIJ" "AxCxEFGHIJ" "ABxDxFGHIJ" "ABCxExGHIJ" "ABCDxFxHIJ"
# [6] "ABCDExGxIJ" "ABCDEFxHxJ" "ABCDEFGxIx"
f('ABCDEFGHIJ', 'x**x')
# [1] "xBCxEFGHIJ" "AxCDxFGHIJ" "ABxDExGHIJ" "ABCxEFxHIJ" "ABCDxFGxIJ"
# [6] "ABCDExGHxJ" "ABCDEFxHIx"
f('ABCDEFGHIJ', 'xx**x')
# [1] "xxCDxFGHIJ" "AxxDExGHIJ" "ABxxEFxHIJ" "ABCxxFGxIJ" "ABCDxxGHxJ"
# [6] "ABCDExxHIx"
f('ABCDEFGHIJ', 'kk**krr')
# [1] "kkCDkrrHIJ" "AkkDEkrrIJ" "ABkkEFkrrJ" "ABCkkFGkrr"
f('ABCDEFGHIJ', 'kk**kr*r')
# [1] "kkCDkrGrIJ" "AkkDEkrHrJ" "ABkkEFkrIr"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With