Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete only one element from a string if there is at least one match to the element

Tags:

string

r

vector

I have the following strings:

remove_none <- "B,B,C,C,D"
remove_A <- "B,B,C,C,A"
remove_only_one <- "B,A,C,A,C,A"

I want to remove only one A if there is at least one A in the string.

I can split the string into a vector, then delete the needed value and paste it back separated by comas. I know purrr has a function discard(), however it removes all of the matching elements from the vector.

What I need in the result is:

remove_none <- "B,B,C,C,D"
remove_A <- "B,B,C,C"
remove_only_one <- "B,A,C,A,C"

Any advice appreciated!

EDIT: components are always separated by comas

like image 759
HelloBiology Avatar asked Dec 21 '25 22:12

HelloBiology


2 Answers

Removing the first instance

If you want to remove only the first instance, we can use base::sub(). We want to remove A and the comma that follows (if there is one). That would give us the pattern A,?. However, there's also your "B,B,C,C,A" case, where we want to remove the A that's the final character of the string. In this case, there is no comma to follow, so we want to remove the preceding comma. So our pattern becomes:

sub("A,?|,A", "", s)

i.e. either A possibly followed by a comma or ,A. This will also work with a string like this where both the first and last characters are A e.g.:

remove_start_A <- "A,B,C,A"

We can generalise this to apply to any character:

remove_first <- function(s, char = "A") {
    pattern <- sprintf("%s,?|,%s", char, char)
    sub(pattern, "", s)
}

Let's see it in action:

# create the named vector of strings
s <- c(
    remove_none = remove_none,
    remove_A = remove_A,
    remove_only_one = remove_only_one,
    remove_start_A = remove_start_A
)

remove_first(s)
# remove_none        remove_A remove_only_one  remove_start_A
# "B,B,C,C,D"      "B,B,C,C,"     "B,C,A,C,A"         "B,C,A"

remove_first(s, "B")
# remove_none        remove_A remove_only_one  remove_start_A
#   "B,C,C,D"       "B,C,C,A"     "A,C,A,C,A"         "A,C,A"

Removing the last instance

To remove the last occurrence, reverse the string, apply the above approach and then reverse back:

remove_last <- function(s, char = "A") {
    stringi::stri_reverse(s) |>
        remove_first(char) |>
        stringi::stri_reverse() |>
        setNames(names(s))
}

remove_last(s)
# remove_none        remove_A remove_only_one  remove_start_A
# "B,B,C,C,D"       "B,B,C,C"     "B,A,C,A,C"         "A,B,C"

If you don't want to use an external package to reverse the string (which is much quicker), see this question for base R approaches.

Edge cases

Let's assume that the following edge cases should return an empty string:

edge_cases <- c(
    one_a = "A", # matched by "A,?"
    one_a_trailing_comma = "A,", # # matched by "A,?"
    one_a_leading_comma = ",A" # matched by ",A"
)

This approach returns an empty string for these both forwards and backwards:

# forwards
remove_first(edge_cases)
#    one_a one_a_trailing_comma  one_a_leading_comma
#       ""                   ""                   ""

# backwards
remove_last(edge_cases)
#    one_a one_a_trailing_comma  one_a_leading_comma
#       ""                   ""                   ""
like image 145
SamR Avatar answered Dec 24 '25 13:12

SamR


We can use this simple function:

fn <- function(st, delim = ",")
  sapply(strsplit(st, delim), function(vec) paste(vec[vec != "A" | duplicated(vec)], collapse = delim))

This keeps any element that is not "A", or if it is then if it is a duplicate of "A".

fn(remove_none)
# [1] "B,B,C,C,D"
fn(remove_A)
# [1] "B,B,C,C"
fn(remove_only_one)
# [1] "B,C,A,C,A"

For some added generality, we can remove the first n instances of a string.

Edit: motivated by SamR's suggestion of "reverse", we can add that option as well:

fn2 <- function(st, delim = ",", remove = "A", n = 1, reverse = FALSE) {
  strsplit(st, delim) |>
    sapply(function(vec) paste(vec[vec != remove | cumsum(vec == remove) > n], collapse = delim))
}

fn2(remove_only_one, n=0)
# [1] "B,A,C,A,C,A"
fn2(remove_only_one, n=1)
# [1] "B,C,A,C,A"
fn2(remove_only_one, n=2)
# [1] "B,C,C,A"

fn2(remove_only_one, remove="C", n=2)
# [1] "B,A,A,A"

fn2(remove_only_one, n=0, reverse=T)
# [1] "B,A,C,A,C,A"
fn2(remove_only_one, n=1, reverse=T)
# [1] "B,A,C,A,C"
fn2(remove_only_one, n=2, reverse=T)
# [1] "B,A,C,C"
fn2(remove_only_one, n=3, reverse=T)
# [1] "B,C,C"
like image 26
r2evans Avatar answered Dec 24 '25 12:12

r2evans



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!