I'm not new to R but I am relatively new to regular expressions.
A similar question can be found in here, but it asks to split on the first comma rather than the last one.
As an example, if I use
> lastcomma_strsplit("UK, USA, Germany", ", ")
[[1]]
[1] "UK" "USA" "Germany"
I want to get
[[1]]
[1] "UK, USA" "Germany"
And if I use
> lastcomma_strsplit("London, Washington, D.C., Berlin", ", ")
[[1]]
[1] "London" "Washington" "D.C." "Berlin"
I want to get
[[1]]
[1] "London, Washington, D.C." "Berlin"
One viable way I think is to replace the last comma by something else such as
$, #, *, ...
then use
strsplit()
to split the string by the one you replaced (Make sure it is unique!), but I'm more happy if you can deal with the problem using some built in function directly.
So how can I do that?
The splitting of comma separated values in an R vector can be done by unlisting the elements of the vector then using strsplit function for splitting. For example, if we have a vector say x that contains comma separated values then the splitting of those values will be done by using the command unlist(strsplit(x,",")).
To split a string in R, use the strsplit() method. The strsplit() is a built-in R function that splits the string vector into sub-strings. The strsplit() method returns the list, where each list item resembles the item of input that has been split.
Here's one approach:
strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE)
## [[1]]
## [1] "UK, USA" " Germany"
You may want:
strsplit("UK, USA, Germany", ",\\s*(?=[^,]+$)", perl=TRUE)
## [[1]]
## [1] "UK, USA" "Germany"
As it will match if there is no space after the comma:
strsplit(c("UK, USA, Germany", "UK, USA,Germany"), ",\\s*(?=[^,]+$)", perl=TRUE)
## [[1]]
## [1] "UK, USA" "Germany"
##
## [[2]]
## [1] "UK, USA" "Germany"
You can use stri_split
function from stringi
package
x <- "USA,UK,Poland"
stri_split_fixed(x,",") # standard split by comma
[[1]]
[1] "USA" "UK" "Poland"
stri_split_fixed(x,",",n = 2) # set the max number of elements
[[1]]
[1] "USA" "UK,Poland"
Unfortunately there is no parameter to change the starting point for splitting (from begin/end) but we can handle this another way - using stri_reverse
stri_split_fixed(stri_reverse(x),",",n = 2) #reverse
[[1]]
[1] "dnaloP" "KU,ASU"
stri_reverse(stri_split_fixed(stri_reverse(x),",",n = 2)[[1]]) #reverse back
[1] "Poland" "USA,UK"
stri_reverse(stri_split_fixed(stri_reverse(x),",",n = 2)[[1]])[2:1] #and again :)
[1] "USA,UK" "Poland"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With