I am trying to replace commas bounded by nonwhite space with a white space, while keeping other commas untouched (in R).
Imagine I have:
j<-"Abc,Abc, and c"
and I want:
"Abc Abc, and c"
This almost works:
gsub("[^ ],[^ ]"," " ,j)
But it removes the characters either side of the commas to give:
"Ab bc, and c"
The RegExp \s Metacharacter in JavaScript is used to find the whitespace characters. The whitespace character can be a space/tab/new line/vertical character. It is same as [ \t\n\r].
The 0-9 indicates characters 0 through 9, the comma , indicates comma, and the semicolon indicates a ; . The closing ] indicates the end of the character set.
Find Whitespace Using Regular Expressions in JavaThe most common regex character to find whitespaces are \s and \s+ . The difference between these regex characters is that \s represents a single whitespace character while \s+ represents multiple whitespaces in a string.
You may use a PCRE regex with a negative lookbehind and lookahead:
j <- "Abc,Abc, and c"
gsub("(?<!\\s),(?!\\s)", " ", j, perl = TRUE)
## => [1] "Abc Abc, and c"
See the regex demo
Details:
(?<!\\s)
- there cannot be a whitespace right before a ,
,
- a literal ,
(?!\\s)
- there cannot be a whitespace right after a ,
An alternative solution is to match a ,
that is enclosed with word boundaries:
j <- "Abc,Abc, and c"
gsub("\\b,\\b", " ", j)
## => [1] "Abc Abc, and c"
See another R demo.
You can use back references like this:
gsub("([^ ]),([^ ])","\\1 \\2" ,j)
[1] "Abc Abc, and c"
The ()
in the regular expression capture the characters adjacent to the comma. The \\1
and \\2
return these captured values in the order they were captured.
We can try
gsub(",(?=[^ ])", " ", j, perl = TRUE)
#[1] "Abc Abc, and c"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With