Disclaimer: I'm not a regex expert.
I'm using Python re module to perform regex matching on many htm files. One of the patterns is something like this:
<bla><blabla>87765.*</blabla><bla>
The problem I've encountered is that instead of finding all (say) five occurrences of the pattern, it will find only one. Because it welds all the occurrences into one, using the <bla><blabla>87765
part of the first occurrence and the </blabla><bla>
part of the last occurrence in the page.
Is there any way to tell re to find the smallest match?
You can use a reluctant qualifier in your pattern (for more details, reference the python documentation on the *?
, +?
, and ??
operators):
<bla><blabla>87765.*?</blabla><bla>
Or, exclude <
from the possible matched characters:
<bla><blabla>87765[^<]*</blabla><bla>
only if there are no children tags between <blabla>
and </blabla>
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With