Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split on last occurrence of digit, take 2nd part

Tags:

regex

r

If I have a string and want to split on the last digit and keep the last part of the split hpw can I do that?

x <- c("ID", paste0("X", 1:10, state.name[1:10]))

I'd like

 [1] NA            "Alabama"     "Alaska"      "Arizona"     "Arkansas"   
 [6] "California"  "Colorado"    "Connecticut" "Delaware"    "Florida"    
[11] "Georgia"    

But would settle for:

 [1] "ID"          "Alabama"     "Alaska"      "Arizona"     "Arkansas"   
 [6] "California"  "Colorado"    "Connecticut" "Delaware"    "Florida"    
[11] "Georgia"    

I can get the first part by:

unlist(strsplit(x, "[^0-9]*$"))

But want the second part.

Thank you in advance.

like image 849
Tyler Rinker Avatar asked May 24 '12 05:05

Tyler Rinker


1 Answers

You can do this one easy step with a regular expression:

gsub("(^.*\\d+)(\\w*)", "\\2", x)

Results in:

 [1] "ID"          "Alabama"     "Alaska"      "Arizona"     "Arkansas"    "California"  "Colorado"    "Connecticut"
 [9] "Delaware"    "Florida"     "Georgia"  

What the regex does:

  1. "(^.*\\d+)(\\w*)": Look for two groups of characters.
    • The first group (^.*\\d+) looks for any digit followed by at least one number at the start of the string.
    • The second group \\w* looks for an alpha-numeric character.
  2. The "\\2" as the second argument to gsub() means to replace the original string with the second group that the regex found.
like image 116
Andrie Avatar answered Oct 14 '22 08:10

Andrie