Extract a year number from a string that is surrounded by special characters

Question

What's a good way to extract only the number 2007 from the following string:

some_string <- "1_2_start_2007_3_end"

The pattern to detect the year number in my case would be:

4 digits
surrounded by "_"

I am quite new to using regular expressions. I tried the following:

 regexp <- "_+[0-9]+_"
 names <- str_extract(files, regexp)

But this does not take into account that there are always 4 digits and outputs the underlines as well.

Wiktor Stribiżew · Accepted Answer

You may use a sub option, too:

some_string <- "1_2_start_2007_3_end"
sub(".*_(\d{4})_.*", "\1", some_string)

See the regex demo

Details

.* - any 0+ chars, as many as possible
_ - a _ char
(\d{4}) - Group 1 (referred to via \1 from the replacement pattern): 4 digits
_.* - a _ and then any 0+ chars up to the end of string.

NOTE: akrun's str_extract(some_string, "(?<=_)\d{4}") will extract the leftmost occurrence and my sub(".*_(\d{4})_.*", "\1", some_string) will extract the rightmost occurrence of a 4-digit substring enclosed with _. For my my solution to return the leftmost one use a lazy quantifier with the first .: sub(".*?_(\d{4})_.*", "\1", some_string).

R test:

some_string <- "1_2018_start_2007_3_end"
sub(".*?_(\d{4})_.*", "\1", some_string) # leftmost
## -> 2018
sub(".*_(\d{4})_.*", "\1", some_string) # rightmost
## -> 2007

Extract a year number from a string that is surrounded by special characters

Tags:

regex

r

Patrick Balada

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

Extract a year number from a string that is surrounded by special characters

Tags:

regex

r

Patrick Balada

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us