I have a regular expression to match strings like:
--D2CBA65440D
--77094A27E09
--77094A27E
--770
--77094A27E09--
basically, it matches a hexadecimal string surrounded by one or more line breaks or white space, and has the prefix -- and may or may not have -- as suffix
i use the following python code, and it works fine most of the time:
hexaPattern = "\s--[0-9a-fA-F]+[--]?\s"
hex = re.search(hexaPattern, part)
if hex:
print "found a match"
this works for all of the above but it doesn't match --77094A27E09 in this block:
<div id="arrow2" class="headerLinksImg" style="display:block
--77094A27E09
;">
but matches the same string in:
<input type="checkbox" name="checkbox" id="checkboxKG3" class
--77094A27E09
Content-T="checkboxKG" value="KG3" />
What am i doing wrong?
import re
hexaPattern = re.compile(r'\s--([0-9a-fA-F]+)(?:--)?\s')
m = re.search(hexaPattern, part)
if m:
print "found a match:", m.group(1)
This pre-compiles the pattern for speed. This uses a r'' (raw string) so the backslashes are sure to be passed through correctly. This adds parentheses to make a "match group" so you can extract your hex string after the match; it also adds a "non-matching group" around the second -- string.
Because you used the square brackets around the second "--", you got a "character class". I'm not sure exactly what the character class [--] matches; I think it should just match any '-' character. In a character class, a '-' is usually used for a range, as in [a-z] but the range [--] makes no sense so I think it would fall back to just matching a '-'. The problem is: because you have the ? after it, it would only match zero or one '-' character, and you need it to be able to match two.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With