I'm trying a regex lookahead in R with the following command:
sub(x = street.addresses, pattern = "\\s((?i)Street|(?i)St\\.?)(?=\\sNE)", replacement = " St")
My goal is to replace Street with St where it's followed by a space and the directional NE (as in, "Northeast"). It seems like the lookahead couldn't be more straightforward but I keep hitting an error:
Error in sub(x = streets, pattern = "\\s((?i)Street|(?i)St\\.?)(?=\\sNE)",:
invalid regular expression '\s((?i)Street|(?i)St\.?)(?=\sNE)', reason
'Invalid regexp'
Versions of this without the lookahead work fine in R, but as soon as I add a lookahead of any sort to my search/replace, I hit the error. Likewise, other regex R functions like grep seem to have the same problem.
I've copied/pasted that regex expression into engines like https://regex101.com/ and it seems to work fine there, so I'm confused. Am I missing something basic about regex in R?
EDIT:
Here's a copy direct from my console:
> street.addresses <- c("23 Charles Street NE","23 Charles St. NE")
> new.vec <- sub(x = street.addresses, pattern = "\\s((?i)Street|(?i)St\\.?)
(?=\\sNE)", replacement = " St")
Error in sub(x = street.addresses, pattern = "\\s((?i)Street|(?i)St\\.?)(?
=\\sNE)", :
invalid regular expression '\s((?i)Street|(?i)St\.?)(?=\sNE)', reason
'Invalid regexp'
You need to use sub in Perl mode if you want to use a lookahead:
street <- "123 Hudson Street NE, New York, NY"
sub(x = street, pattern = "\\s((?i)Street|(?i)St\\.?)(?=\\sNE)",
replacement = " St", perl=TRUE)
[1] "123 Hudson St NE, New York, NY"
By the way, if you put the parameters to sub in their default positions, then you can omit the names, leaving us with a more terse call:
sub("\\s((?i)Street|(?i)St\\.?)(?=\\sNE)", " St", street, perl=TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With