Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match & replace within multiple quoted strings with REGEX

Tags:

regex

r

I want to replace all spaces within quotes with underscores in R. I'm not sure how to define the quoted strings correctly when there are more than one. My starting effort fails, and I haven't even got on to single/double quotes.

require(stringi)
s = "The 'quick brown' fox 'jumps over' the lazy dog"
stri_replace_all(s, regex="('.*) (.*')", '$1_$2')
#> [1] "The 'quick brown' fox 'jumps_over' the lazy dog"

Grateful for help.

like image 299
geotheory Avatar asked Feb 28 '26 05:02

geotheory


2 Answers

Let's assume you need to match all non-overlapping substrings that start with ', then have 1 or more chars other than ' and then end with '. The pattern is '[^']+'.

Then, you may use the following base R code:

x = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog"
gr <- gregexpr("'[^']+'", x)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_")
x
## => [1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"

See this R demo. Or, use gsubfn:

> library(gsubfn)
> rx <- "'[^']+'"
> s = "The 'quick cunning brown' fox 'jumps up and over' the lazy dog"
> gsubfn(rx, ~ gsub("\\s", "_", x), s)
[1] "The 'quick_cunning_brown' fox 'jumps_up_and_over' the lazy dog"
> 

To support escape sequences, you may use a much more complex PCRE regex:

(?<!\\)(?:\\{2})*\K'[^'\\]*(?:\\.[^'\\]*)*'

Details:

  • (?<!\\) - no \ immediately before the current location
  • (?:\\{2})* - zero or more sequences of 2 \s
  • \K - match reset operator
  • ' - a single quote
  • [^'\\]* - zero or more chars other than ' and \
  • (?:\\.[^'\\]*)* - zero or more sequences of:
    • \\. - a \ followed with any char but a newline
    • [^'\\]* - zero or more chars other than ' and \
  • ' - a single quote.

And the R demo would look like

x = "The \\' \\\\\\' \\\\\\\\'quick \\'cunning\\' brown' fox 'jumps up \\'and\\' over' the lazy dog"
cat(x, sep="\n")
gr <- gregexpr("(?<!\\\\)(?:\\\\{2})*\\K'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", x, perl=TRUE)
mat <- regmatches(x, gr)
regmatches(x, gr) <- lapply(mat, gsub, pattern="\\s", replacement="_")
cat(x, sep="\n")

Output:

The \' \\\' \\\\'quick \'cunning\' brown' fox 'jumps up \'and\' over' the lazy dog
The \' \\\' \\\\'quick_\'cunning\'_brown' fox 'jumps_up_\'and\'_over' the lazy dog
like image 167
Wiktor Stribiżew Avatar answered Mar 01 '26 21:03

Wiktor Stribiżew


Try this:

require(stringi)
s = "The 'quick brown' fox 'jumps over' the lazy dog"
stri_replace_all(s, regex="('[a-z]+) ([a-z]+')", '$1_$2')
like image 35
AChervony Avatar answered Mar 01 '26 20:03

AChervony



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!