I want to replace:
'''<font size="3"><font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font></font>'''
With:
='''<font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font>'''=
Now my existing code is:
$html =~ s/\n(.+)<font size=\".+?\">(.+)<\/font>(.+)\n/\n=$1$2$3=\n/gm
However this ends up with this as the result:
=''' SUMMER/WINTER CONFIGURATION FILES</font>'''=
Now I can see what is happening, it is matching <font size ="..... all the way up to the end of the <font colour blue">
which is not what I want, I want it to stop at the first instance of " not the last, I thought that is what putting the ? mark there would do, however I've tried .+ .+? .* and .*? with the same result each time.
Anyone got any ideas what I am doing wrong?
Write .+?
in all places to make each match non-greedy.
$html =~ s/\n(.+?)<font size=\".+?\">(.+?)<\/font>(.+?)\n/\n=$1$2$3=\n/gm ^ ^ ^ ^
Also try to avoid using regular expressions to parse HTML. Use an HTML parser if possible.
You could change .+
to [^"]+
(instead of "match anything", "match anything that isn't a "
"...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With