Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Split string on first two colons





I would like to split a column of strings on the first two colons, but not on any subsequent colons:

my.data <- read.table(text='

my.string    some.data
123:34:56:78   -100
87:65:43:21    -200
a4:b6:c8888    -300
11:bbbb:ccccc  -400
uu:vv:ww:xx    -500', header = TRUE)

desired.result <- read.table(text='

my.string1  my.string2  my.string3  some.data
123         34          56:78         -100
87          65          43:21         -200
a4          b6          c8888         -300
11          bbbb        ccccc         -400
uu          vv          ww:xx         -500', header = TRUE)

I have searched extensively and the following question is the closest to my current dilemma:

Split on first comma in string

Thank you for any suggestions. I prefer to use base R.


The number of characters before the first colon is not always two and the number of characters between the first two colons is not always two. So, I edited the example to reflect this.

like image 521
Mark Miller Avatar asked Nov 03 '13 03:11

Mark Miller

2 Answers

In base R:

> my.data <- read.table(text='
+ my.string    some.data
+ 123:34:56:78   -100
+ 87:65:43:21    -200
+ a4:b6:c8888    -300
+ 11:bbbb:ccccc  -400
+ uu:vv:ww:xx    -500', header = TRUE,stringsAsFactors=FALSE)
> m <- regexec ("^([^:]+):([^:]+):(.*)$",my.data$my.string)
> my.data$my.string1 <- unlist(lapply(regmatches(my.data$my.string,m),'[',c(2)))
> my.data$my.string2 <- unlist(lapply(regmatches(my.data$my.string,m),'[',c(3)))
> my.data$my.string3 <- unlist(lapply(regmatches(my.data$my.string,m),'[',c(4)))
> my.data
      my.string some.data my.string1 my.string2 my.string3
1  123:34:56:78      -100        123         34      56:78
2   87:65:43:21      -200         87         65      43:21
3   a4:b6:c8888      -300         a4         b6      c8888
4 11:bbbb:ccccc      -400         11       bbbb      ccccc
5   uu:vv:ww:xx      -500         uu         vv      ww:xx

You'll see I've used stringsAsFactors=FALSE to ensure that my.string can be processed as a vector of strings.

like image 166
Simon Avatar answered Oct 27 '22 19:10


Using package stringr:

str_match(my.data$my.string, "(.+?):(.+?):(.*)")

     [,1]            [,2]  [,3]   [,4]   
[1,] "123:34:56:78"  "123" "34"   "56:78"
[2,] "87:65:43:21"   "87"  "65"   "43:21"
[3,] "a4:b6:c8888"   "a4"  "b6"   "c8888"
[4,] "11:bbbb:ccccc" "11"  "bbbb" "ccccc"
[5,] "uu:vv:ww:xx"   "uu"  "vv"   "ww:xx"

UPDATE: with latest example (above) and Hadley's comment solution:

str_split_fixed(my.data$my.string, ":", 3)
     [,1]  [,2]   [,3]   
[1,] "123" "34"   "56:78"
[2,] "87"  "65"   "43:21"
[3,] "a4"  "b6"   "c8888"
[4,] "11"  "bbbb" "ccccc"
[5,] "uu"  "vv"   "ww:xx"
like image 39
topchef Avatar answered Oct 27 '22 18:10
