I am relatively new to regular expressions and I am running into a dead end. I have a data frame with a column that looks like this:
year1
GMM14_2000_NGVA
GMM14_2001_NGVA
GMM14_2002_NGVA
...
GMM14_2014_NGVA
I am trying to extract the year in the middle of the string (2000,2001, etc). This is my code thus far
gsub("[^0-9]","",year1))
Which returns the number but it also returns the 14 that is part of the string:
142000
142001
Any idea on how to exclude the 14 from the pattern or how to extract the year information more efficiently?
Thanks
Use the following gsub
:
s = "GMM14_2002_NGVA"
gsub("^[^_]*_|_[^_]*$", "", s)
See IDEONE demo
The regex breakdown:
Match...
^[^_]*_
- 0 or more characters other than _
from the start of string and a_
|
- or..._[^_]*$
- a _
and 0 or more characters other than _
to the end of stringand remove them.
As an alternative,
library(stringr)
str_extract(s,"(?<=_)\\d{4}(?=_)")
Where the Perl-like regex matches 4-digit substring that is enclosed with underscores.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With