Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find a word before one of two possible separators

Tags:

regex

r

word:12335
anotherword:2323434
totallydifferentword/455
word/32

I need to grab the character string before the : or / using only base R functions. I can do this using stringr but don't want to add another dependency to my package. The words can have variable number of character but would always end at (one of) the separator(s). I don't need to keep what comes after.

like image 777
Maiasaura Avatar asked Oct 02 '12 16:10

Maiasaura


2 Answers

Maybe try:

x <- c("word:12335", "anotherword:2323434", "totallydifferentword/455", "word/32")
lapply(strsplit(x, ":|/"), function(z) z[[1]]) #as a list
sapply(strsplit(x, ":|/"), function(z) z[[1]]) #as a string

There are regex solutions with gsub that will work too but in my experiences with similar problems strsplit will be less eloquent but faster.

I supose this regex would work as well:

gsub("([a-z]+)([/|:])([0-9]+)", "\\1", x)

In this case gsub was faster:

Unit: microseconds
        expr    min     lq median     uq     max
1     GSUB() 19.127 21.460 22.392 23.792 106.362
2 STRSPLIT() 46.650 50.849 53.182 54.581 854.162
like image 155
Tyler Rinker Avatar answered Nov 16 '22 11:11

Tyler Rinker


Something like this would do the trick in Ruby http://rubular.com/r/PzVQVIpKPq

^(\w+)(?:[:\/])

Starting from the front of the string, grab any word characters and capture them, until you reach the non-capturing / or :

like image 2
Tadgh Avatar answered Nov 16 '22 11:11

Tadgh