What a strsplit function in R does is, match and delete a given regular expression to split the rest of the string into vectors.
>strsplit("abc123def", "[0-9]+")
[[1]]
[1] "abc" "" "" "def"
But how should I split the string the same way using regular expression, but also retain the matches? I need something like the following.
>FUNCTION("abc123def", "[0-9]+")
[[1]]
[1] "abc" "123" "def"
Using strapply("abc123def", "[0-9]+|[a-z]+") works here, but what if the rest of the string other than the matches cannot be captured by a regular expression?
The strsplit() in R programming language function is used to split the elements of the specified character vector into substrings according to the given substring taken as its parameter.
The str_split() function from the stringr package in R can be used to split a string into multiple pieces. This function uses the following syntax: str_split(string, pattern) where: string: Character vector.
Fundamentally, it seems to me that what you want is not to split on [0-9]+
but to split on the transition between [0-9]+
and everything else. In your string, that transition is not pre-existing. To insert it, you could pre-process with gsub
and back-referencing:
test <- "abc123def"
strsplit( gsub("([0-9]+)","~\\1~",test), "~" )
[[1]]
[1] "abc" "123" "def"
You could use lookaround assertions.
> test <- "abc123def"
> strsplit(test, "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)", perl=T)
[[1]]
[1] "abc" "123" "def"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With