Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional subexpression replacement using regular expressions

I have text input similar to that shown below. I'd like to add the word auto before each 'a=b' pattern, but only if it is part of a sequence following the keyword kywrd (separated by semicolons).

kywrd a=b;c=d;
e=f;
fnctn z;
g=h;

So the output I'm looking for here is:

kywrd2 auto a=b;auto c=d;
auto e=f;
fnctn z;
g=h;

The Perl6 (Raku?) code below uses a regular expression to add the auto keyword, but only before the first a=b pattern. Is there a simple way to perform the substitution for all patterns in the sequence; leaving g=h; unmodified?

my Str $x = slurp "in.q";
$x ~~ s:g /kywrd\s+(\w+)\=(\w+)\;/kywrd2 auto $0=$1\;/;
spurt "out.q", $x;
like image 937
user2023370 Avatar asked Oct 30 '19 09:10

user2023370


People also ask

How do you replace special characters in regex?

If you are having a string with special characters and want's to remove/replace them then you can use regex for that. Use this code: Regex. Replace(your String, @"[^0-9a-zA-Z]+", "")

Can I use regex in replace?

The Regex. Replace(String, String, MatchEvaluator, RegexOptions) method is useful for replacing a regular expression match if any of the following conditions is true: If the replacement string cannot readily be specified by a regular expression replacement pattern.

What does \\ mean in regex?

The backslash character (\) in a regular expression indicates that the character that follows it either is a special character (as shown in the following table), or should be interpreted literally. For more information, see Character Escapes. Escaped character. Description. Pattern.


3 Answers

One possible way that keeps the regexing to a minimum:

sub repl ($input) 
{ 
    $input.Str
    .split(';', :skip-empty)
    .map( 'auto ' ~ * ~ ';')
    .join('')
 };

 my $foo = 'kywrd a=b;c=d;d=e;'; 
 $foo ~~ s:g /kywrd \s+ (.+)/kywrds2 { repl($0) }/; 
 $foo.say;

Personally I'd prefer the method form subst over the s// operator though.

$foo .= subst(/ kywrd \s+ (.+) /, "kywrds2 { repl($0) }", :g); 
like image 123
Holli Avatar answered Sep 23 '22 04:09

Holli


One way:

# Create a separate named regex that captures an `x=y;` pair:
my regex pair { (\w+) \= (\w+) \; (\s*) }
# (Capture `(\s*)` so formatting between pairs is retained)

# Generate and return 'auto'-ized replacement of a captured pair: 
sub auto-ize ($/) { "auto $0=$1;$2" }

$x ~~ s:g { kywrd \s+ <pair>+ } = "kywrd2 $<pair>».&auto-ize.join()";

All the code I've shown would be simple to understand for someone a little familiar with Raku but I'll explain it anyway.

  • I've broken out a named regex to match a pair. (See my answer to Difference in capturing and non-capturing regex scope in Perl 6 / Raku for details about why/how <pair> calls the pair regex.)

  • The auto-ize sub routine uses the match variable ($/) as its argument. This is convenient because $0 etc. are then automatically aliased to the numbered captures associated with the passed match object.

  • I've used syntax of the form s [ ... ] = " ... " because I think it's more readable for this use case. (See mention of "different delimiters" in s/// doc.)

  • The "kywrd2 ..." string will be repeatedly evaluated and become a replacement of a match, once for each match of the multiple s:g matches.

  • The $<pair>».&auto-ize.join() bit is code being interpolated under double quoted string rules.

  • $<pair> is short for $/<pair>, i.e. the <pair> key of $/. It refers to the pair named capture associated with the match variable. The latter will correspond to each match of the multiple s:g matches in turn.

  • The + quantifier in the regex expression <pair>+ means that, if it matches, it produces a List of capture (match) objects rather than just one (as would be the case if the expression was instead just <pair> or <pair>?).

  • » treats its LHS operand as a tree or list (in this case a list of one or more capture/match objects, one per foo=bar;... pair) and walks over its elements. For each "leaf" element the » does the operation on its right. (» is a powerful operator but has nice simple use cases such as this one where it's just a notationally convenient and compact equivalent of a for loop. You can write it as >> if you prefer ASCII.)

  • .&auto-ize calls the auto-ize sub routine as if it were a method, using the operand to its left as the first argument.

The test input data from @PolarBear's answer:

kywrd a=b;c=d;
e=f;
fnctn z;
g=h;
k=m;
fnctn y;
kywrd m=n;
k=j;
kywrd z=a;b=i;
kywrd c=x;e=i;
z=q;
fnctn o;

Putting that into in.q and saying the resulting out.q displays:

kywrd2 auto a=b;auto c=d;
auto e=f;
fnctn z;
g=h;
k=m;
fnctn y;
kywrd2 auto m=n;
auto k=j;
kywrd2 auto z=a;auto b=i;
kywrd2 auto c=x;auto e=i;
auto z=q;
fnctn o;
like image 21
raiph Avatar answered Sep 25 '22 04:09

raiph


Not very elegant but workable code (ancient way)

#!/usr/bin/perl

use strict;
use warnings;

OUTER: while(<DATA>) {
    if( s/kywrd /kywrd2 / ) {
        do {
            if( ! s/(\w+)=(\w+)/auto $1=$2/g ) {
                print;
                next OUTER;
            }
            print;
        } while ( <DATA> );
    } else {
        print;
    }
}

__DATA__
kywrd a=b;c=d;
e=f;
fnctn z;
g=h;
k=m;
fnctn y;
kywrd m=n;
k=j;
kywrd z=a;b=i;
kywrd c=x;e=i;
z=q;
fnctn o;

I need to look at Raku - what kind of animal it is.

like image 38
Polar Bear Avatar answered Sep 23 '22 04:09

Polar Bear