Is there an efficient way to substitute a bunch a strings using values from a Perl hash?
For example,
$regex{foo} = "bar";
$regex{hello} = "world";
$regex{python} = "perl";
open(F, "myfile.txt");
while (<F>) {
foreach $key (keys %regex) {
s/$key/$regex{$key}/g;
}
}
close(F);
Is there a way to accomplish the above in Perl?
First question: are you sure that what you have is inefficient?
Second, the most obvious next step would be to pull everything into a single regex:
my $check = join '|', keys %regex;
And then you can do the substitution as:
s/($check)/$regex{$1}/g;
This can still be "slow" with sufficient overlap of the keys where the regex engine has to recheck the same letters constantly. You can possibly use something like Regexp::Optimizer to eliminate the overlap. But the cost of optimising may be more than the cost of just doing everything, depending on how many changes (key/values in your hash) and how many lines you're modifying. Premature optimisation-- !
Note that, of course, your example code isn't doing anything with the text after the substitution. It won't modify the file in-place, so I'm assuming you're handling that separately.
Define a regexp that matches any of the keys.
$regex = join("|", map {quotemeta} keys %regex);
Replace any match of $regex
by $regex{$1}
.
s/($regex)/$regex{$1}/go;
Omit the o
modifier if $regex
changes during the execution of the program.
Note that if there are keys that are a prefix of another key (e.g. f
and foo
), whichever comes first in the joined regexp will be seen as a match (e.g. f|foo
matches f
but foo|f
matches foo
in foobar
). If that can happen, you may need to sort keys %regex
according to which match you want to win. (Thanks to ysth for pointing this out.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With