Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl Replacing digits with "x" in string, but only in one particular location

Tags:

regex

perl

I am parsing a log file filled with various errors. These are web errors, and it means that a client made a goof in formatting the date for our website. The log looks like this:

Error 123: Customer 2: Bad Date [17/12/2014]
Error 123: Customer 2: Bad Date [19/12/2014]
Error 123: Customer 1: Bad Date [123/23/222]
Error 123: Customer 2: Bad Date [null]
Error 123: Customer 6: Bad Date [12/14:]
Error 123: Customer 6: Bad Date [12/16:]

Now, the first two are really the same error for the same customer. Both lines, the date was reported as DD/MM/YYYY instead of YYYY/MM/DD, so I don't need to report this error twice. The last two lines are also the same error for the same customer. The used MM/DD and left off the year. The null date is another error even though I reported Customer #2's Bad Date error before. Somewhere, they're passing a null date.

What I'd like to do is compare the lines this way:

Error 123: Customer 2: Bad Date [xx/xx/xxxx]
Error 123: Customer 2: Bad Date [xx/xx/xxxx]
Error 123: Customer 1: Bad Date [xxx/xx/xxx]
Error 123: Customer 2: Bad Date [null]
Error 123: Customer 6: Bad Date [xx/xx:]
Error 123: Customer 6: Bad Date [xx/xx:]

Now, it's easy to see that the first two and last two lines are really the same error. The question is how to do this with a regular expression. I want to change all digits between the [ and ] to x, but I don't want to touch the rest of the string, so I don't want to convert the Error or Customer numbers to x.

I first tried:

$error =~ s/(\[.*?)\d/$1x/g;

But that only touches the first digit in the brackets. I've tried it without the non-greedy qualifier, but that only touches the last character.

I could simply do this:

$error =~ s/\d/x/g;

But that replaces all occurrences of a digit with an x destroying my Error number and Customer number.

I can pass the error line over and over again until there's no more replacement:

while ( my $error = <DATA> ) {
    chomp $error;
    while ( $error =~ s/(\[.*?)\d/$1x/ ) {
        1;
    }
    say qq(Error: "$error");
}

But there must be a way I can do this without having to loop through a while loop multiple times.

Is there a way to efficiently replace all occurrences of a digit with an x, but only between the two square brackets?

like image 958
David W. Avatar asked Jan 27 '14 21:01

David W.


People also ask

What is \W in Perl regex?

A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or _ , not a whole word. Use \w+ to match a string of Perl-identifier characters (which isn't the same as matching an English word).

How do I match a pattern in Perl?

m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.

How do I match parentheses in Perl?

Whenever you use parentheses for grouping, they automatically work as memory parentheses as well. So, if you use /./, you'll match any single character (except newline); if you use /(.)/, you'll still match any single character, but now it will be kept in a regular expression memory.

How do I delete a pattern in Perl?

Overview. We can use the chop() method to remove the last character of a string in Perl. This method removes the last character present in a string. It returns the character that is removed from the string, as the return value.


4 Answers

I'd use this solution:

$error =~ s{(\[ [^\]]+ \])}{
  (my $date = $1) =~ tr/0-9/x/;
  $date;
}ex;

This won't work in older perls without a re-entrant regex engine. Apparently, I was wrong. I tried that code with a freshly-brewed perl 5.10.1, and it worked just fine.

Alternatively, you could abuse an lvalue substr:

if ($error =~ /\[/gc) {
  my $start  = pos $error;
  my $length = index($error, ']', $start) - $start;
  substr($error, $start, $length) =~ tr/0-9/x/;
}
like image 101
amon Avatar answered Sep 23 '22 18:09

amon


You can't do it all in one pass. You need to extract the part to which you want to do the replacements, apply the replacements, then reform the string.

if (
   my ($pre, $date, $post) =
      /^ ( [^\[\]]* \[ )( [^\[\]]* )( \] .* )/x
) {
   $date =~ s/[0-9]/x/g;
   $_ = "$pre$date$post";
}

This can be done more concisely.

s{ ( \[ [^\[\]]* \] ) }
 { ( my $x = $1 ) =~ s{[0-9]}{x}g; $x }xeg;

Or if you have 5.14,

s{ ( \[ [^\[\]]* \] ) }
 { $1 =~ s{[0-9]}{x}rg }xeg;
like image 42
ikegami Avatar answered Sep 20 '22 18:09

ikegami


I always like to break these problems up into much simpler pieces:

sub xdigit
{
    my $str= shift ;
    $str =~ tr/[0-9]/xxxxxxxxxx/ ;
    "[$str]"
}

my $x= 'Error 123: Customer 2: Bad Date [17/12/2014]' ;
$x =~ s/\[(.*?)\]/xdigit($1)/e ;

Outputs:

Error 123: Customer 2: Bad Date [xx/xx/xxxx]

like image 22
woolstar Avatar answered Sep 20 '22 18:09

woolstar


You could use:

$error =~ / \[ /gx;
$error =~ s/ \G (.*?) [0-9] /$1x/gx;

The search operation with the modifier /g initially positions the anchor (i.e. the start point for the next search) behind the matched string. The substitution operation then searches from this point (\G) and replaces the first digit somewhere behind it. Due to the /g, additionally the anchor is moved behind the substituted digit and search + substitution are repeated until the end of the string (or, with ([^]]*?) instead of (.*?), until the first closing bracket).

In your first try, the bracket is found only once; the first substitution moves the anchor behind the substituted digit, and the next search fails to find the bracket. With use re 'debug'; to see the anchor moving.

like image 32
Thomas Avatar answered Sep 24 '22 18:09

Thomas