Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting all characters ahead of first appearance of special character in R

Tags:

string

split

r

I want to get all characters that are ahead of the first "." if there is one. Otherwise, I want to get back the same character ("8" -> "8").

Example:

v<-c("7.7.4","8","12.6","11.5.2.1")

I want to get something like this:

[1] "7 "8" "12" "11"

My idea was to split each element at "." and then only take the first split. I found no solution that worked...

like image 839
Krypt Avatar asked Dec 19 '22 21:12

Krypt


1 Answers

You can use sub

sub("\\..*", "", v)
#[1] "7"  "8"  "12" "11"

or a few stringi options:

library(stringi)
stri_replace_first_regex(v, "\\..*", "")
#[1] "7"  "8"  "12" "11"
# extract vs. replace
stri_extract_first_regex(v, "[^\\.]+")
#[1] "7"  "8"  "12" "11"

If you want to use a splitting approach, these will work:

unlist(strsplit(v, "\\..*"))
#[1] "7"  "8"  "12" "11"

# stringi option
unlist(stri_split_regex(v, "\\..*", omit_empty=TRUE))
#[1] "7"  "8"  "12" "11"
unlist(stri_split_fixed(v, ".", n=1, tokens_only=TRUE))
unlist(stri_split_regex(v, "[^\\w]", n=1, tokens_only=TRUE))

Other sub variations that use a capture group to target the leading characters specifically:

sub("(\\w+).+", "\\1", v) # \w matches [[:alnum:]_] (i.e. alphanumerics and underscores)
sub("([[:alnum:]]+).+", "\\1", v) # exclude underscores

# variations on a theme
sub("(\\w+)\\..*", "\\1", v)
sub("(\\d+)\\..*", "\\1", v) # narrower: \d for digits specifically
sub("(.+)\\..*", "\\1", v) # broader: "." matches any single character

# stringi variation just for fun:
stri_extract_first_regex(v, "\\w+")
like image 68
Jota Avatar answered Jan 30 '23 22:01

Jota