I've got about 400'000 files that need some text to be replaced.
I tried the following Perl script:
@files = <*.html>;
foreach $file (@files) {
`perl -0777 -i -pe 's{<div[^>]+?id="user-info"[^>]*>.*?</div>}{}gsmi;' $file`;
`perl -0777 -i -pe 's{<div[^>]+?class="generic"[^>]*>[^\s]*<small>[^\s]*Author.*?</div>.*?</div>.*?</div>.*?</div>.*?</div>}{}gsmi;' $file`;
`perl -0777 -i -pe 's{<script[^>]+?src="javascript.*?"[^>]*>.*?</script>}{}gsmi;' $file`;
`perl -p -i -e 's/.css.html/.css/g;' $file`;
}
I don't have a deep Perl knowledge, but the script runs too slow (updates only about 180 files per day).
Is there a way to speed it up?
Thank you in advance!
PS: When I tested it on a smaller number of files, I've noticed a much better performance...
Calling perl from perl will always be slower than doing all the work in one process. So, the solution might be
perl -i -pe 'BEGIN { undef $/ }
s{<div[^>]+?id="user-info"[^>]*>.*?</div>}{}gsmi;
s{<div[^>]+?class="generic"[^>]*>[^\s]*<small>[^\s]*Author.*?</div>.*?</div>.*?</div>.*?</div>.*?</div>}{}gsmi;
s{<script[^>]+?src="javascript.*?"[^>]*>.*?</script>}{}gsmi;
s/.css.html/.css/g;
' *.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With