I have the following strings:
remove_none <- "B,B,C,C,D"
remove_A <- "B,B,C,C,A"
remove_only_one <- "B,A,C,A,C,A"
I want to remove only one A if there is at least one A in the string.
I can split the string into a vector, then delete the needed value and paste it back separated by comas. I know purrr has a function discard(), however it removes all of the matching elements from the vector.
What I need in the result is:
remove_none <- "B,B,C,C,D"
remove_A <- "B,B,C,C"
remove_only_one <- "B,A,C,A,C"
Any advice appreciated!
EDIT: components are always separated by comas
If you want to remove only the first instance, we can use base::sub(). We want to remove A and the comma that follows (if there is one). That would give us the pattern A,?. However, there's also your "B,B,C,C,A" case, where we want to remove the A that's the final character of the string. In this case, there is no comma to follow, so we want to remove the preceding comma. So our pattern becomes:
sub("A,?|,A", "", s)

i.e. either A possibly followed by a comma or ,A. This will also work with a string like this where both the first and last characters are A e.g.:
remove_start_A <- "A,B,C,A"
We can generalise this to apply to any character:
remove_first <- function(s, char = "A") {
pattern <- sprintf("%s,?|,%s", char, char)
sub(pattern, "", s)
}
Let's see it in action:
# create the named vector of strings
s <- c(
remove_none = remove_none,
remove_A = remove_A,
remove_only_one = remove_only_one,
remove_start_A = remove_start_A
)
remove_first(s)
# remove_none remove_A remove_only_one remove_start_A
# "B,B,C,C,D" "B,B,C,C," "B,C,A,C,A" "B,C,A"
remove_first(s, "B")
# remove_none remove_A remove_only_one remove_start_A
# "B,C,C,D" "B,C,C,A" "A,C,A,C,A" "A,C,A"
To remove the last occurrence, reverse the string, apply the above approach and then reverse back:
remove_last <- function(s, char = "A") {
stringi::stri_reverse(s) |>
remove_first(char) |>
stringi::stri_reverse() |>
setNames(names(s))
}
remove_last(s)
# remove_none remove_A remove_only_one remove_start_A
# "B,B,C,C,D" "B,B,C,C" "B,A,C,A,C" "A,B,C"
If you don't want to use an external package to reverse the string (which is much quicker), see this question for base R approaches.
Let's assume that the following edge cases should return an empty string:
edge_cases <- c(
one_a = "A", # matched by "A,?"
one_a_trailing_comma = "A,", # # matched by "A,?"
one_a_leading_comma = ",A" # matched by ",A"
)
This approach returns an empty string for these both forwards and backwards:
# forwards
remove_first(edge_cases)
# one_a one_a_trailing_comma one_a_leading_comma
# "" "" ""
# backwards
remove_last(edge_cases)
# one_a one_a_trailing_comma one_a_leading_comma
# "" "" ""
We can use this simple function:
fn <- function(st, delim = ",")
sapply(strsplit(st, delim), function(vec) paste(vec[vec != "A" | duplicated(vec)], collapse = delim))
This keeps any element that is not "A", or if it is then if it is a duplicate of "A".
fn(remove_none)
# [1] "B,B,C,C,D"
fn(remove_A)
# [1] "B,B,C,C"
fn(remove_only_one)
# [1] "B,C,A,C,A"
For some added generality, we can remove the first n instances of a string.
Edit: motivated by SamR's suggestion of "reverse", we can add that option as well:
fn2 <- function(st, delim = ",", remove = "A", n = 1, reverse = FALSE) {
strsplit(st, delim) |>
sapply(function(vec) paste(vec[vec != remove | cumsum(vec == remove) > n], collapse = delim))
}
fn2(remove_only_one, n=0)
# [1] "B,A,C,A,C,A"
fn2(remove_only_one, n=1)
# [1] "B,C,A,C,A"
fn2(remove_only_one, n=2)
# [1] "B,C,C,A"
fn2(remove_only_one, remove="C", n=2)
# [1] "B,A,A,A"
fn2(remove_only_one, n=0, reverse=T)
# [1] "B,A,C,A,C,A"
fn2(remove_only_one, n=1, reverse=T)
# [1] "B,A,C,A,C"
fn2(remove_only_one, n=2, reverse=T)
# [1] "B,A,C,C"
fn2(remove_only_one, n=3, reverse=T)
# [1] "B,C,C"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With