Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Split string and keep substrings righthand of match?

Tags:

r

strsplit

How to do this stringsplit() in R? Stop splitting when no first names seperated by dashes remain. Keep right hand side substring as given in results.

a <- c("tim/tom meyer XY900 123kncjd", "sepp/max/peter moser VK123 456xyz")

# result: 
c("tim meyer XY900 123kncjd", "tom meyer XY900 123kncjd", "sepp moser VK123 456xyz", "max moser VK123 456xyz", "peter moser VK123 456xyz")
like image 691
Kay Avatar asked Feb 05 '16 14:02

Kay


2 Answers

Here is one possibility using a few of the different base string functions.

## get the lengths of the output for each first name
len <- lengths(gregexpr("/", sub(" .*", "", a), fixed = TRUE)) + 1L
## extract all the first names 
## using the fact that they all end at the first space character
fn <- scan(text = a, sep = "/", what = "", comment.char = " ")
## paste them together
paste0(fn, rep(regmatches(a, regexpr(" .*", a)), len))
# [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd"
# [3] "sepp moser VK123 456xyz"  "max moser VK123 456xyz"  
# [5] "peter moser VK123 456xyz"

Addition: Here is a second possibility, using a little less code. Might be a little faster too.

s <- strsplit(a, "\\/|( .*)")
paste0(unlist(s), rep(regmatches(a, regexpr(" .*", a)), lengths(s)))
# [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd"
# [3] "sepp moser VK123 456xyz"  "max moser VK123 456xyz"  
# [5] "peter moser VK123 456xyz"
like image 98
Rich Scriven Avatar answered Oct 30 '22 07:10

Rich Scriven


I'd do it like that (with stringi):

library("stringi")

a <- c("tim/tom meyer XY900 123kncjd", "sepp/max/peter moser VK123 456xyz")

stri_split_fixed(stri_match_first_regex(a, "(.+?)[ ]")[,2], "/") -> start
stri_match_first_regex(a, "[ ](.+)")[,2] -> end


for(i in 1:length(end)){
    start[[i]] <- paste(start[[i]], end[i])
}

unlist(start)

## [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd" "sepp moser VK123 456xyz" 
## [4] "max moser VK123 456xyz"   "peter moser VK123 456xyz"
like image 40
Marta Avatar answered Oct 30 '22 07:10

Marta