What's a good way to extract only the number 2007 from the following string:
some_string <- "1_2_start_2007_3_end"
The pattern to detect the year number in my case would be:
I am quite new to using regular expressions. I tried the following:
regexp <- "_+[0-9]+_"
names <- str_extract(files, regexp)
But this does not take into account that there are always 4 digits and outputs the underlines as well.
You may use a sub
option, too:
some_string <- "1_2_start_2007_3_end"
sub(".*_(\\d{4})_.*", "\\1", some_string)
See the regex demo
Details
.*
- any 0+ chars, as many as possible_
- a _
char(\\d{4})
- Group 1 (referred to via \1
from the replacement pattern): 4 digits_.*
- a _
and then any 0+ chars up to the end of string.NOTE: akrun's str_extract(some_string, "(?<=_)\\d{4}")
will extract the leftmost occurrence and my sub(".*_(\\d{4})_.*", "\\1", some_string)
will extract the rightmost occurrence of a 4-digit substring enclosed with _
. For my my solution to return the leftmost one use a lazy quantifier with the first .
: sub(".*?_(\\d{4})_.*", "\\1", some_string)
.
R test:
some_string <- "1_2018_start_2007_3_end"
sub(".*?_(\\d{4})_.*", "\\1", some_string) # leftmost
## -> 2018
sub(".*_(\\d{4})_.*", "\\1", some_string) # rightmost
## -> 2007
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With