I have a string similar to the following
my_string <- "apple,banana,orange,"
And I want to split by , to produce the output:
list(c('apple', 'banana', 'orange', ""))
I thought strsplit would accomplish this but it treats the trailing ',' like it doesn't exist
my_string <- "apple,banana,orange,"
strsplit(my_string, split = ',')
#> [[1]]
#> [1] "apple" "banana" "orange"
Created on 2023-11-15 by the reprex package (v2.0.1)
What is the simplest approach to achieve the desired output?
Some more test cases with example strings and desired outputs
string1 = "apple,banana,orange,"
output1 = list(c('apple', 'banana', 'orange', ''))
string2 = "apple,banana,orange,pear"
output2 = list(c('apple', 'banana', 'orange', 'pear'))
string3 = ",apple,banana,orange"
output3 = list(c('', 'apple', 'banana', 'orange'))
## Examples of non-comma separated strings
# '|' separator
string4 = "|apple|banana|orange|"
output4 = list(c('', 'apple', 'banana', 'orange', ''))
# 'x' separator
string5 = "xapplexbananaxorangex"
output5 = list(c('', 'apple', 'banana', 'orange', ''))
EDIT:
Ideally solution should generalize to any splitting character
Would also prefer a base-R solution (although do still link any packages which supply this functionality since their source code might be useful to look through!)
strsplit Doesn't Give Desired Output?When you type ?strsplit, you will read the following statement
Note that this means that if there is a match at the beginning of a (non-empty) string, the first element of the output is "", but if there is a match at the end of the string, the output is the same as with the match removed.
That is the reason you don't see the trailing "" when you use strsplit.
Below are some demonstrations
> strsplit("apple,banana,orange,", ",")
[[1]]
[1] "apple" "banana" "orange"
> strsplit(",apple,banana,orange,", ",")
[[1]]
[1] "" "apple" "banana" "orange"
> strsplit(",apple,banana,orange", ",")
[[1]]
[1] "" "apple" "banana" "orange"
> strsplit("apple,banana,orange", ",")
[[1]]
[1] "apple" "banana" "orange"
If you want to make a coding practice, one base R option can be defining a custom function (recursion) like below
f <- function(x, sep = ",") {
pat <- sprintf("^(.*?)%s.*", sep)
s1 <- sub(pat, "\\1", x)
s2 <- sub(paste0("^.*?", sep), "", x)
if (s2 == x) {
return(x)
}
c(s1, Recall(s2, sep))
}
or a variant with substr + regexpr
f <- function(x, sep = ",") {
idx <- regexpr(sep, x)
s1 <- substr(x, 1, idx - 1)
s2 <- substr(x, idx + 1, nchar(x))
if (s2 == x) {
return(x)
}
c(s1, Recall(s2, sep))
}
such that
> f("apple,banana,orange,")
[1] "apple" "banana" "orange" ""
> f(",apple,banana,orange,")
[1] "" "apple" "banana" "orange" ""
> f(",apple,banana,orange")
[1] "" "apple" "banana" "orange"
> f("apple,banana,orange")
[1] "apple" "banana" "orange"
Pasting another separator at the end should allow strsplit to function as intended.
Otherwise, you could fall back to using the scan function, which underpins the read.csv/table functions:
strsplit(paste0(string1, ","), ",")
##[[1]]
##[1] "apple" "banana" "orange" ""
Generalisably taking into account regex replacement:
L <- list(string1, string2, string3, string4, string5)
mapply(
function(x,s) strsplit(paste0(x, gsub("\\\\", "", s)), split=s),
L,
c(",", ",", ",", "\\|", "x")
)
##[[1]]
##[1] "apple" "banana" "orange" ""
##
##[[2]]
##[1] "apple" "banana" "orange" "pear"
##
##[[3]]
##[1] "" "apple" "banana" "orange"
##
##[[4]]
##[1] "" "apple" "banana" "orange" ""
##
##[[5]]
##[1] "" "apple" "banana" "orange" ""
scan option:
scan(text=string1, sep=",", what="")
##Read 4 items
##[1] "apple" "banana" "orange" ""
Generalising:
mapply(
function(x,s) scan(text=x, sep=s, what=""),
L,
c(",", ",", ",", "|", "x")
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With