I have a strange request with regex in R. I have vector of character strings where some have multiple trailing periods. I want to replace these periods with spaces. The example and desired outcome should make clear what I'm after (maybe I need to attack it with what I give to replace argument rather than the pattern argument of gsub
):
Example and attempts:
x <- c("good", "little.bad", "really.ugly......")
gsub("\\.$", " ", x)
#produces this
#[1] "good" "little.bad" "really.ugly..... "
gsub("\\.+$", " ", x)
#produces this
#[1] "good" "little.bad" "really.ugly "
Desired outcome
[1] "good" "little.bad" "really.ugly "
So the original vector (x) had the last string with 6 periods at the end so I'd like 6 spaces without touching the period between really and ugly. I know the $
looks at the end but can't get past this.
Try this:
gsub("\\.(?=\\.*$)", " ", mystring, perl=TRUE)
Explanation:
\. # Match a dot
(?= # only if followed by
\.* # zero or more dots
$ # until the end of the string
) # End of lookahead assertion.
While I waited for a regex solution that makes sense I decided to come up with a nonsensical way to solve this:
messy.sol <- function(x) {
paste(unlist(list(gsub("\\.+$", "", x),
rep(" ", nchar(x) - nchar(gsub("\\.+$", "", x))))),collapse="")
}
sapply(x, messy.sol, USE.NAMES = FALSE)
I'd say Tim's is a bit prettier :)
Tim's solution is clearly better but I figured I'd try my hand at an alternate way. Using liberal use of regmatches
helps us out here
x <- c("good", "little.bad", "really.ugly......")
# Get an object with 'match data' to feed into regmatches
# Here we match on any number of periods at the end of a string
out <- regexpr("\\.*$", x)
# On the right hand side we extract the pieces of the strings
# that match our pattern with regmatches and then replace
# all the periods with spaces. Then we use assignment
# to store that into the spots in our strings that match the
# regular expression.
regmatches(x, out) <- gsub("\\.", " ", regmatches(x, out))
x
#[1] "good" "little.bad" "really.ugly "
So not quite as clean as a single regular expression. But I've never really gotten around to learning those 'lookahead's in perl regular expressions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With