I am trying to replace commas bounded by nonwhite space with a white space, while keeping other commas untouched (in R). Imagine I have: <pre class="prettyprint"><code>j<-"Abc,Abc, and c" </code></pre> and I want: <pre class="prettyprint"><code>"Abc Abc, and c" </code></pre> This almost works: <pre class="prettyprint"><code>gsub("[^ ],[^ ]"," " ,j) </code></pre> But it removes the characters either side of the commas to give: <pre class="prettyprint"><code>"Ab bc, and c" </code></pre>

You may use a PCRE regex with a negative lookbehind and lookahead: <pre class="prettyprint"><code>j <- "Abc,Abc, and c" gsub("(?<!\\s),(?!\\s)", " ", j, perl = TRUE) ## => [1] "Abc Abc, and c" </code></pre> See the regex demo Details: <ul> <li> <code>(?<!\\s)</code> - there cannot be a whitespace right before a <code>,</code> </li> <li> <code>,</code> - a literal <code>,</code> </li> <li> <code>(?!\\s)</code> - there cannot be a whitespace right after a <code>,</code> </li> </ul> An alternative solution is to match a <code>,</code> that is enclosed with word boundaries: <pre class="prettyprint"><code>j <- "Abc,Abc, and c" gsub("\\b,\\b", " ", j) ## => [1] "Abc Abc, and c" </code></pre> See another R demo.

We can try <pre class="prettyprint"><code>gsub(",(?=[^ ])", " ", j, perl = TRUE) #[1] "Abc Abc, and c" </code></pre>

Regular expression matching on comma bounded by nonwhite space

Tags:

regex

r

regular-language

I am trying to replace commas bounded by nonwhite space with a white space, while keeping other commas untouched (in R).

Imagine I have:

j<-"Abc,Abc, and c"

and I want:

"Abc Abc, and c"

This almost works:

gsub("[^ ],[^ ]"," " ,j)

But it removes the characters either side of the commas to give:

"Ab bc, and c"

692

asked Mar 01 '17 12:03

tsutsume

3 Answers

You may use a PCRE regex with a negative lookbehind and lookahead:

j <- "Abc,Abc, and c"
gsub("(?<!\\s),(?!\\s)", " ", j, perl = TRUE)
## => [1] "Abc Abc, and c"

See the regex demo

Details:

(?<!\\s) - there cannot be a whitespace right before a ,
, - a literal ,
(?!\\s) - there cannot be a whitespace right after a ,

An alternative solution is to match a , that is enclosed with word boundaries:

j <- "Abc,Abc, and c"
gsub("\\b,\\b", " ", j)
## => [1] "Abc Abc, and c"

See another R demo.

146

answered Nov 15 '22 06:11

Wiktor Stribiżew

You can use back references like this:

gsub("([^ ]),([^ ])","\\1 \\2" ,j)
[1] "Abc Abc, and c"

The () in the regular expression capture the characters adjacent to the comma. The \\1 and \\2 return these captured values in the order they were captured.

answered Nov 15 '22 07:11

lmo

We can try

gsub(",(?=[^ ])", " ", j, perl = TRUE)
#[1] "Abc Abc, and c"

answered Nov 15 '22 06:11

akrun

Related questions
                            
                                Using a JSON array in a POST request [duplicate]
                            
                                Fast rolling mean + summarize
                            
                                Using dplyr summarise in R with dynamic variable
                            
                                Side-by-side rgl plots with R Markdown
                            
                                Group by a column and sort by another column in R
                            
                                dplyr: How to handle multiple value
                            
                                shiny Error in match.arg(position) : 'arg' must be NULL or a character vector
                            
                                Delete unconnected short paths from a graph in igraph
                            
                                Return a matrix with `ifelse`
                            
                                Plot sine curve in R
                            
                                create sequence of numbers with leading zeroes [duplicate]
                            
                                Automatic loading of data from sysdata.rda in package
                            
                                Making symbols bold in ggplot2
                            
                                Rcpparmadillo: can't call Fortran routine "dgebal"?
                            
                                S4 object with a pointer to a C struct
                            
                                Merge columns of a dataframe by two conditions using aggregate
                            
                                How to select unique columns in an R matrix
                            
                                Cannot Install R Packages in Docker Image
                            
                                Creating a New Variable Based on a Categorical Variable Already in the Dataset
                            
                                gather with multiple keys [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With