I am parsing a log file filled with various errors. These are web errors, and it means that a client made a goof in formatting the date for our website. The log looks like this:
Error 123: Customer 2: Bad Date [17/12/2014]
Error 123: Customer 2: Bad Date [19/12/2014]
Error 123: Customer 1: Bad Date [123/23/222]
Error 123: Customer 2: Bad Date [null]
Error 123: Customer 6: Bad Date [12/14:]
Error 123: Customer 6: Bad Date [12/16:]
Now, the first two are really the same error for the same customer. Both lines, the date was reported as DD/MM/YYYY
instead of YYYY/MM/DD
, so I don't need to report this error twice. The last two lines are also the same error for the same customer. The used MM/DD
and left off the year. The null
date is another error even though I reported Customer #2's Bad Date error before. Somewhere, they're passing a null date.
What I'd like to do is compare the lines this way:
Error 123: Customer 2: Bad Date [xx/xx/xxxx]
Error 123: Customer 2: Bad Date [xx/xx/xxxx]
Error 123: Customer 1: Bad Date [xxx/xx/xxx]
Error 123: Customer 2: Bad Date [null]
Error 123: Customer 6: Bad Date [xx/xx:]
Error 123: Customer 6: Bad Date [xx/xx:]
Now, it's easy to see that the first two and last two lines are really the same error. The question is how to do this with a regular expression. I want to change all digits between the [
and ]
to x
, but I don't want to touch the rest of the string, so I don't want to convert the Error or Customer numbers to x
.
I first tried:
$error =~ s/(\[.*?)\d/$1x/g;
But that only touches the first digit in the brackets. I've tried it without the non-greedy qualifier, but that only touches the last character.
I could simply do this:
$error =~ s/\d/x/g;
But that replaces all occurrences of a digit with an x
destroying my Error number and Customer number.
I can pass the error line over and over again until there's no more replacement:
while ( my $error = <DATA> ) {
chomp $error;
while ( $error =~ s/(\[.*?)\d/$1x/ ) {
1;
}
say qq(Error: "$error");
}
But there must be a way I can do this without having to loop through a while
loop multiple times.
Is there a way to efficiently replace all occurrences of a digit with an x
, but only between the two square brackets?
A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or _ , not a whole word. Use \w+ to match a string of Perl-identifier characters (which isn't the same as matching an English word).
m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.
Whenever you use parentheses for grouping, they automatically work as memory parentheses as well. So, if you use /./, you'll match any single character (except newline); if you use /(.)/, you'll still match any single character, but now it will be kept in a regular expression memory.
Overview. We can use the chop() method to remove the last character of a string in Perl. This method removes the last character present in a string. It returns the character that is removed from the string, as the return value.
I'd use this solution:
$error =~ s{(\[ [^\]]+ \])}{
(my $date = $1) =~ tr/0-9/x/;
$date;
}ex;
This won't work in older perls without a re-entrant regex engine. Apparently, I was wrong. I tried that code with a freshly-brewed perl 5.10.1, and it worked just fine.
Alternatively, you could abuse an lvalue substr
:
if ($error =~ /\[/gc) {
my $start = pos $error;
my $length = index($error, ']', $start) - $start;
substr($error, $start, $length) =~ tr/0-9/x/;
}
You can't do it all in one pass. You need to extract the part to which you want to do the replacements, apply the replacements, then reform the string.
if (
my ($pre, $date, $post) =
/^ ( [^\[\]]* \[ )( [^\[\]]* )( \] .* )/x
) {
$date =~ s/[0-9]/x/g;
$_ = "$pre$date$post";
}
This can be done more concisely.
s{ ( \[ [^\[\]]* \] ) }
{ ( my $x = $1 ) =~ s{[0-9]}{x}g; $x }xeg;
Or if you have 5.14,
s{ ( \[ [^\[\]]* \] ) }
{ $1 =~ s{[0-9]}{x}rg }xeg;
I always like to break these problems up into much simpler pieces:
sub xdigit
{
my $str= shift ;
$str =~ tr/[0-9]/xxxxxxxxxx/ ;
"[$str]"
}
my $x= 'Error 123: Customer 2: Bad Date [17/12/2014]' ;
$x =~ s/\[(.*?)\]/xdigit($1)/e ;
Outputs:
Error 123: Customer 2: Bad Date [xx/xx/xxxx]
You could use:
$error =~ / \[ /gx;
$error =~ s/ \G (.*?) [0-9] /$1x/gx;
The search operation with the modifier /g
initially positions the anchor (i.e. the start point for the next search) behind the matched string. The substitution operation then searches from this point (\G
) and replaces the first digit somewhere behind it. Due to the /g
, additionally the anchor is moved behind the substituted digit and search + substitution are repeated until the end of the string (or, with ([^]]*?)
instead of (.*?)
, until the first closing bracket).
In your first try, the bracket is found only once; the first substitution moves the anchor behind the substituted digit, and the next search fails to find the bracket. With use re 'debug';
to see the anchor moving.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With