Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Eliminate space before period unless followed by a digit

Tags:

regex

r

How can I use R's regex to eliminate space(s) before period(s) unless period is followed by a digit?

Here's what I have and what I've tried:

x <- c("I have .32 dollars AKA 32 cents . ", 
    "I have .32 dollars AKA 32 cents .  Hello World .")

gsub("(\\s+)(?=\\.+)", "", x, perl=TRUE)
gsub("(\\s+)(?=\\.+)(?<=[^\\d])", "", x, perl=TRUE)

This gives (no space before .32):

## [1] "I have.32 dollars AKA 32 cents. "             
## [2] "I have.32 dollars AKA 32 cents.  Hello World."

I'd like to get:

## [1] "I have .32 dollars AKA 32 cents. "             
## [2] "I have .32 dollars AKA 32 cents.  Hello World."

I'm saddled with gsub here but other solutions welcomed to make the question more usable to future searchers.

like image 543
Tyler Rinker Avatar asked Dec 20 '22 11:12

Tyler Rinker


1 Answers

You don't need a complex expression, you can use a Positive Lookahead here.

> gsub(' +(?=\\.(?:\\D|$))', '', x, perl=T)
## [1] "I have .32 dollars AKA 32 cents. "             
## [2] "I have .32 dollars AKA 32 cents.  Hello World."

Explanation:

 +        # ' ' (1 or more times)
(?=       # look ahead to see if there is:
  \.      #   '.'
  (?:     #   group, but do not capture:
    \D    #      non-digits (all but 0-9)
   |      #     OR
    $     #      before an optional \n, and the end of the string
  )       #   end of grouping
)         # end of look-ahead

Note: If these space characters could be any type of whitespace just replace ' '+ with \s+


If you are content with using the (*SKIP)(*F) backtracking verbs, here is the correct representation:

> gsub(' \\.\\d(*SKIP)(*F)| +(?=\\.)', '', x, perl=T)
## [1] "I have .32 dollars AKA 32 cents. "             
## [2] "I have .32 dollars AKA 32 cents.  Hello World."
like image 74
hwnd Avatar answered Jan 19 '23 11:01

hwnd