Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regular expression to strip leading characters up to first encountered digit

Tags:

regex

r

I have a string titled thisLine and I'd like to remove all characters before the first integer. I can use the command

regexpr("[0123456789]",thisLine)[1]

to determine the position of the first integer. How do I use that index to split the string?

like image 607
user984165 Avatar asked Dec 21 '12 21:12

user984165


3 Answers

The short answer:

sub('^\\D*', '', thisLine)

where

  • ^ matches the beginning of the string
  • \\D matches any non-digit (it is the opposite of \\d)
  • \\D* tries to match as many consecutive non-digits as possible
like image 107
flodel Avatar answered Oct 08 '22 21:10

flodel


My personal preference, skipping regexp altogether:

sub("^.*?(\\d)","\\1",thisLine)
#breaking down the regex
#^ beginning of line
#. any character
#* repeated any number of times (including 0)
#? minimal qualifier (match the fewest characters possible with *)
#() groups the digit
#\\d digit 
#\\1 backreference to first captured group (the digit)
like image 42
Blue Magister Avatar answered Oct 08 '22 21:10

Blue Magister


You want the substring function.

Or use gsub to do work in one shot:

> gsub('^[^[:digit:]]*[[:digit:]]', '', 'abc1def')
[1] "def"

You may want to include that first digit, which can be done with a capture:

> gsub('^[^[:digit:]]*([[:digit:]])', '\\1', 'abc1def')
[1] "1def"

Or as flodel and Alan indicate, simply replace "all leading digits" with a blank. See flodel's answer.

like image 30
Matthew Lundberg Avatar answered Oct 08 '22 20:10

Matthew Lundberg