Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Regex be used for this particular string manipulation?

I need to replace character (say) x with character (say) P in a string, but only if it is contained in a quoted substring. An example makes it clearer:

axbx'cxdxe'fxgh'ixj'k  -> axbx'cPdPe'fxgh'iPj'k

Let's assume, for the sake of simplicity, that quotes always come in pairs.

The obvious way is to just process the string one character at a time (a simple state machine approach);
however, I'm wondering if regular expressions can be used to do all the processing in one go.

My target language is C#, but I guess my question pertains to any language having builtin or library support for regular expressions.

like image 567
Cristian Diaconescu Avatar asked Sep 26 '08 10:09

Cristian Diaconescu


People also ask

Is regex used for different string operations?

Regular Expressions (a.k.a regex) are a set of pattern matching commands used to detect string sequences in a large text data. These commands are designed to match a family (alphanumeric, digits, words) of text which makes then versatile enough to handle any text / string class.

Does regex only work on strings?

So, yes, regular expressions really only apply to strings. If you want a more complicated FSM, then it's possible to write one, but not using your local regex engine. Save this answer.

Which method is used to test match in string regex?

The Match(String, String, RegexOptions) method returns the first substring that matches a regular expression pattern in an input string. For information about the language elements used to build a regular expression pattern, see Regular Expression Language - Quick Reference.


2 Answers

I converted Greg Hewgill's python code to C# and it worked!

[Test]
public void ReplaceTextInQuotes()
{
  Assert.AreEqual("axbx'cPdPe'fxgh'iPj'k", 
    Regex.Replace("axbx'cxdxe'fxgh'ixj'k",
      @"x(?=[^']*'([^']|'[^']*')*$)", "P"));
}

That test passed.

like image 162
jop Avatar answered Nov 14 '22 03:11

jop


I was able to do this with Python:

>>> import re
>>> re.sub(r"x(?=[^']*'([^']|'[^']*')*$)", "P", "axbx'cxdxe'fxgh'ixj'k")
"axbx'cPdPe'fxgh'iPj'k"

What this does is use the non-capturing match (?=...) to check that the character x is within a quoted string. It looks for some nonquote characters up to the next quote, then looks for a sequence of either single characters or quoted groups of characters, until the end of the string.

This relies on your assumption that the quotes are always balanced. This is also not very efficient.

like image 39
Greg Hewgill Avatar answered Nov 14 '22 03:11

Greg Hewgill