Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace trailing periods with spaces

Tags:

regex

r

I have a strange request with regex in R. I have vector of character strings where some have multiple trailing periods. I want to replace these periods with spaces. The example and desired outcome should make clear what I'm after (maybe I need to attack it with what I give to replace argument rather than the pattern argument of gsub):

Example and attempts:

x <- c("good", "little.bad", "really.ugly......")
gsub("\\.$", " ", x)
  #produces this
  #[1] "good"              "little.bad"        "really.ugly..... "
gsub("\\.+$", " ", x)
  #produces this
  #[1] "good"         "little.bad"   "really.ugly "

Desired outcome

[1] "good"              "little.bad"        "really.ugly      "

So the original vector (x) had the last string with 6 periods at the end so I'd like 6 spaces without touching the period between really and ugly. I know the $ looks at the end but can't get past this.

like image 614
Tyler Rinker Avatar asked Aug 31 '12 21:08

Tyler Rinker


3 Answers

Try this:

gsub("\\.(?=\\.*$)", " ", mystring, perl=TRUE)

Explanation:

\.   # Match a dot
(?=  # only if followed by
 \.* # zero or more dots
 $   # until the end of the string
)    # End of lookahead assertion.
like image 197
Tim Pietzcker Avatar answered Nov 20 '22 07:11

Tim Pietzcker


While I waited for a regex solution that makes sense I decided to come up with a nonsensical way to solve this:

messy.sol <- function(x) {
paste(unlist(list(gsub("\\.+$", "", x), 
    rep(" ", nchar(x) -  nchar(gsub("\\.+$", "", x))))),collapse="")
}

sapply(x, messy.sol, USE.NAMES = FALSE)

I'd say Tim's is a bit prettier :)

like image 30
Tyler Rinker Avatar answered Nov 20 '22 08:11

Tyler Rinker


Tim's solution is clearly better but I figured I'd try my hand at an alternate way. Using liberal use of regmatches helps us out here

x <- c("good", "little.bad", "really.ugly......")
# Get an object with 'match data' to feed into regmatches
# Here we match on any number of periods at the end of a string
out <- regexpr("\\.*$", x)

# On the right hand side we extract the pieces of the strings
# that match our pattern with regmatches and then replace
# all the periods with spaces.  Then we use assignment
# to store that into the spots in our strings that match the
# regular expression.
regmatches(x, out) <- gsub("\\.", " ", regmatches(x, out))
x
#[1] "good"              "little.bad"        "really.ugly      "

So not quite as clean as a single regular expression. But I've never really gotten around to learning those 'lookahead's in perl regular expressions.

like image 44
Dason Avatar answered Nov 20 '22 09:11

Dason