I'm looking to extract the year from a string. This always comes after an 'X' and before "." then a string of other characters.
Using stringr
's str_extract
I'm trying the following:
year = str_extract(string = 'X2015.XML.Outgoing.pounds..millions.'
, pattern = 'X(\\d{4})\\.')
I thought the brackets would define the capture group, returning 2015
, but I actually get the complete match X2015.
Am I doing this correctly? Why am i not trimming "X" and "."?
I believe the most idiomatic way is to use str_match
:
str_match(string = 'X2015.XML.Outgoing.pounds..millions.',
pattern = 'X(\\d{4})\\.')
Which returns the complete match followed by capture groups:
[,1] [,2]
[1,] "X2015." "2015"
As such the following will do the trick:
str_match(string = 'X2015.XML.Outgoing.pounds..millions.',
pattern = 'X(\\d{4})\\.')[2]
The capture group is irrelevant in this case. The function str_extract
will return the whole match including characters before and after the capture group.
You have to work with lookbehind and lookahead instead. Their length is zero.
library(stringr)
str_extract(string = 'X2015.XML.Outgoing.pounds..millions.',
pattern = '(?<=X)\\d{4}(?=\\.)')
# [1] "2015"
This regex matches four consecutive digits that are preceded by an X
and followed by a .
.
Alternatively, you can use gsub
:
string = 'X2015.XML.Outgoing.pounds..millions.'
gsub("X(\\d{4})\\..*", "\\1", string)
# [1] "2015"
or str_replace
from stringr
:
library(stringr)
str_replace(string, "X(\\d{4})\\..*", "\\1")
# [1] "2015"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With