Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find and replace a particular character but only if it is in quotes?

Problem: I have thousands of documents which contains a specific character I don't want. E.g. the character a. These documents contain a variety of characters, but the a's I want to replace are inside double quotes or single quotes.

I would like to find and replace them, and I thought using Regex would be needed. I am using VSCode, but I'm open to any suggestions.

My attempt: I was able to find the following regex to match for a specific string containing the values inside the ().

".*?(r).*?"

However, this only highlights the entire quote. I want to highlight the character only.

Any solution, perhaps outside of regex, is welcome.

Example outcomes: Given, the character is a, find replace to b

Somebody once told me "apples" are good for you => Somebody once told me "bpples" are good for you

"Aardvarks" make good kebabs => "Abrdvbrks" make good kebabs

The boy said "aaah!" when his mom told him he was eating aardvark => The boy said "bbbh!" when his mom told him he was eating aardvark

like image 726
Ka Mok Avatar asked Feb 20 '18 03:02

Ka Mok


People also ask

How do you replace a quote in Notepad ++?

The easiest way to do this is to highlight one of the quotes, then select Search, then Replace.

How do you remove quotes from a string?

To remove double quotes from a string:Call the replace() method on the string. The replace method will replace each occurrence of a double quote with an empty string. The replace method will return a new string with all double quotes removed.

How do you add double quotes in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.


2 Answers

Visual Studio Code

VS Code uses JavaScript RegEx engine for its find / replace functionality. This means you are very limited in working with regex in comparison to other flavors like .NET or PCRE.

Lucky enough that this flavor supports lookaheads and with lookaheads you are able to look for but not consume character. So one way to ensure that we are within a quoted string is to look for number of quotes down to bottom of file / subject string to be odd after matching an a:

a(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)

Live demo

This looks for as in a double quoted string, to have it for single quoted strings substitute all "s with '. You can't have both at a time.

There is a problem with regex above however, that it conflicts with escaped double quotes within double quoted strings. To match them too if it matters you have a long way to go:

a(?=[^"\\]*(?:\\.[^"\\]*)*"[^"\\]*(?:\\.[^"\\]*)*(?:"[^"\\]*(?:\\.[^"\\]*)*"[^"\\]*(?:\\.[^"\\]*)*)*$)

Applying these approaches on large files probably will result in an stack overflow so let's see a better approach.

I am using VSCode, but I'm open to any suggestions.

That's great. Then I'd suggest to use awk or sed or something more programmatic in order to achieve what you are after or if you are able to use Sublime Text a chance exists to work around this problem in a more elegant way.

Sublime Text

This is supposed to work on large files with hundred of thousands of lines but care that it works for a single character (here a) that with some modifications may work for a word or substring too:

Search for:

(?:"|\G(?<!")(?!\A))(?<r>[^a"\\]*+(?>\\.[^a"\\]*)*+)\K(a|"(*SKIP)(*F))(?(?=((?&r)"))\3)
                           ^              ^            ^

Replace it with: WHATEVER\3

Live demo

RegEx Breakdown:

(?: # Beginning of non-capturing group #1
    "   # Match a `"`
    |   # Or
    \G(?<!")(?!\A)  # Continue matching from last successful match
                    # It shouldn't start right after a `"`
)   # End of NCG #1
(?<r>   # Start of capturing group `r`
    [^a"\\]*+   # Match anything except `a`, `"` or a backslash (possessively)
    (?>\\.[^a"\\]*)*+   # Match an escaped character or 
                        # repeat last pattern as much as possible
)\K     # End of CG `r`, reset all consumed characters
(   # Start of CG #2 
    a   # Match literal `a`
    |   # Or
    "(*SKIP)(*F)    # Match a `"` and skip over current match
)
(?(?=   # Start a conditional cluster, assuming a positive lookahead
    ((?&r)")    # Start of CG #3, recurs CG `r` and match `"`
  )     # End of condition
  \3    # If conditional passed match CG #3
 )  # End of conditional

enter image description here

Three-step approach

Last but not least...

Matching a character inside quotation marks is tricky since delimiters are exactly the same so opening and closing marks can not be distinguished from each other without taking a look at adjacent strings. What you can do is change a delimiter to something else so that you can look for it later.

Step 1:

Search for: "[^"\\]*(?:\\.[^"\\]*)*"

Replace with: $0Я

Step 2:

Search for: a(?=[^"\\]*(?:\\.[^"\\]*)*"Я)

Replace with whatever you expect.

Step 3:

Search for:

Replace with nothing to revert every thing.


like image 84
revo Avatar answered Oct 13 '22 22:10

revo


/(["'])(.*?)(a)(.*?\1)/g

With the replace pattern:

$1$2$4

As far as I'm aware, VS Code uses the same regex engine as JavaScript, which is why I've written my example in JS.

The problem with this is that if you have multiple a's in 1 set of quotes, then it will struggle to pull out the right values, so there needs to be some sort of code behind it, or you, hammering the replace button until no more matches are found, to recurse the pattern and get rid of all the a's in between quotes

let regex = /(["'])(.*?)(a)(.*?\1)/g,
subst = `$1$2$4`,
str = `"a"
"helapke"
Not matched - aaaaaaa
"This is the way the world ends"
"Not with fire"
"ABBA"
"abba",
'I can haz cheezburger'
"This is not a match'
`;


// Loop to get rid of multiple a's in quotes
while(str.match(regex)){
    str = str.replace(regex, subst);
}

const result = str;
console.log(result);
like image 44
KyleFairns Avatar answered Oct 13 '22 21:10

KyleFairns