Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing white-spaces conditioned on number of occurrences in R

I want to remove white-spaces from a character string when the number of continuous white-spaces between 2 non-whitespace chars words is less than a certain number.

For example,

a <- c("I want            only                <5                         white-spaces   removed")

I know I can remove all the spaces using gsub(' ', '', a). However, I want to remove white-spaces between two non-whitespace chars words only when the total white-space is less than 5. So I want the following

a_adj <- c("Iwant             only                <5                         white-spacesremoved")

I tried this gsub('{,5} ', '', a). But it still removes all white-spaces. Can someone help, please?

Thanks

like image 289
rm167 Avatar asked Mar 02 '23 11:03

rm167


1 Answers

You may use

a_adj <- gsub("(?<=\\S)\\s{1,4}(?=\\S)", "", a, perl=TRUE)

See the regex demo and the R demo.

The (?<=\S)\s{1,4}(?=\S) matches 1 to 4 whitespaces only when in between any non-whitespace characters.

Details

  • (?<=\S) - a positive lookbehind that requires a non-whitespace character immediately to the left of the current location
  • \s{1,4} - 1 to 4 whitespace characters
  • (?=\S) - a positive lookahead that requires a non-whitespace character immediately to the right of the current location.
like image 68
Wiktor Stribiżew Avatar answered Mar 05 '23 16:03

Wiktor Stribiżew