Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression matching on comma bounded by nonwhite space

I am trying to replace commas bounded by nonwhite space with a white space, while keeping other commas untouched (in R).

Imagine I have:

j<-"Abc,Abc, and c"

and I want:

"Abc Abc, and c"

This almost works:

gsub("[^ ],[^ ]"," " ,j)

But it removes the characters either side of the commas to give:

"Ab bc, and c"
like image 692
tsutsume Avatar asked Mar 01 '17 12:03

tsutsume


People also ask

How do you check for space in regex?

The RegExp \s Metacharacter in JavaScript is used to find the whitespace characters. The whitespace character can be a space/tab/new line/vertical character. It is same as [ \t\n\r].

What does comma do in regex?

The 0-9 indicates characters 0 through 9, the comma , indicates comma, and the semicolon indicates a ; . The closing ] indicates the end of the character set.

Can a regex have space?

Find Whitespace Using Regular Expressions in JavaThe most common regex character to find whitespaces are \s and \s+ . The difference between these regex characters is that \s represents a single whitespace character while \s+ represents multiple whitespaces in a string.


3 Answers

You may use a PCRE regex with a negative lookbehind and lookahead:

j <- "Abc,Abc, and c"
gsub("(?<!\\s),(?!\\s)", " ", j, perl = TRUE)
## => [1] "Abc Abc, and c"

See the regex demo

Details:

  • (?<!\\s) - there cannot be a whitespace right before a ,
  • , - a literal ,
  • (?!\\s) - there cannot be a whitespace right after a ,

An alternative solution is to match a , that is enclosed with word boundaries:

j <- "Abc,Abc, and c"
gsub("\\b,\\b", " ", j)
## => [1] "Abc Abc, and c"

See another R demo.

like image 146
Wiktor Stribiżew Avatar answered Nov 15 '22 06:11

Wiktor Stribiżew


You can use back references like this:

gsub("([^ ]),([^ ])","\\1 \\2" ,j)
[1] "Abc Abc, and c"

The () in the regular expression capture the characters adjacent to the comma. The \\1 and \\2 return these captured values in the order they were captured.

like image 40
lmo Avatar answered Nov 15 '22 07:11

lmo


We can try

gsub(",(?=[^ ])", " ", j, perl = TRUE)
#[1] "Abc Abc, and c"
like image 40
akrun Avatar answered Nov 15 '22 06:11

akrun