I would like to split a column of strings on the first two colons, but not on any subsequent colons:
my.data <- read.table(text='
my.string some.data
123:34:56:78 -100
87:65:43:21 -200
a4:b6:c8888 -300
11:bbbb:ccccc -400
uu:vv:ww:xx -500', header = TRUE)
desired.result <- read.table(text='
my.string1 my.string2 my.string3 some.data
123 34 56:78 -100
87 65 43:21 -200
a4 b6 c8888 -300
11 bbbb ccccc -400
uu vv ww:xx -500', header = TRUE)
I have searched extensively and the following question is the closest to my current dilemma:
Split on first comma in string
Thank you for any suggestions. I prefer to use base R.
EDIT:
The number of characters before the first colon is not always two and the number of characters between the first two colons is not always two. So, I edited the example to reflect this.
In base R:
> my.data <- read.table(text='
+
+ my.string some.data
+ 123:34:56:78 -100
+ 87:65:43:21 -200
+ a4:b6:c8888 -300
+ 11:bbbb:ccccc -400
+ uu:vv:ww:xx -500', header = TRUE,stringsAsFactors=FALSE)
> m <- regexec ("^([^:]+):([^:]+):(.*)$",my.data$my.string)
> my.data$my.string1 <- unlist(lapply(regmatches(my.data$my.string,m),'[',c(2)))
> my.data$my.string2 <- unlist(lapply(regmatches(my.data$my.string,m),'[',c(3)))
> my.data$my.string3 <- unlist(lapply(regmatches(my.data$my.string,m),'[',c(4)))
> my.data
my.string some.data my.string1 my.string2 my.string3
1 123:34:56:78 -100 123 34 56:78
2 87:65:43:21 -200 87 65 43:21
3 a4:b6:c8888 -300 a4 b6 c8888
4 11:bbbb:ccccc -400 11 bbbb ccccc
5 uu:vv:ww:xx -500 uu vv ww:xx
You'll see I've used stringsAsFactors=FALSE
to ensure that my.string
can be processed as a vector of strings.
Using package stringr
:
str_match(my.data$my.string, "(.+?):(.+?):(.*)")
[,1] [,2] [,3] [,4]
[1,] "123:34:56:78" "123" "34" "56:78"
[2,] "87:65:43:21" "87" "65" "43:21"
[3,] "a4:b6:c8888" "a4" "b6" "c8888"
[4,] "11:bbbb:ccccc" "11" "bbbb" "ccccc"
[5,] "uu:vv:ww:xx" "uu" "vv" "ww:xx"
UPDATE: with latest example (above) and Hadley's comment solution:
str_split_fixed(my.data$my.string, ":", 3)
[,1] [,2] [,3]
[1,] "123" "34" "56:78"
[2,] "87" "65" "43:21"
[3,] "a4" "b6" "c8888"
[4,] "11" "bbbb" "ccccc"
[5,] "uu" "vv" "ww:xx"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With