Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for hexadecimal string in python not working

Tags:

python

regex

hex

I have a regular expression to match strings like:

--D2CBA65440D

--77094A27E09

--77094A27E

--770

--77094A27E09--

basically, it matches a hexadecimal string surrounded by one or more line breaks or white space, and has the prefix -- and may or may not have -- as suffix

i use the following python code, and it works fine most of the time:

hexaPattern = "\s--[0-9a-fA-F]+[--]?\s"
hex = re.search(hexaPattern, part)
if hex:
   print "found a match"

this works for all of the above but it doesn't match --77094A27E09 in this block:

<div id="arrow2" class="headerLinksImg" style="display:block

--77094A27E09

;">

but matches the same string in:

<input type="checkbox" name="checkbox" id="checkboxKG3" class

--77094A27E09

Content-T="checkboxKG" value="KG3" />

What am i doing wrong?

like image 591
Darth Plagueis Avatar asked May 18 '26 05:05

Darth Plagueis


1 Answers

import re
hexaPattern = re.compile(r'\s--([0-9a-fA-F]+)(?:--)?\s')
m = re.search(hexaPattern, part)
if m:
   print "found a match:", m.group(1)

This pre-compiles the pattern for speed. This uses a r'' (raw string) so the backslashes are sure to be passed through correctly. This adds parentheses to make a "match group" so you can extract your hex string after the match; it also adds a "non-matching group" around the second -- string.

Because you used the square brackets around the second "--", you got a "character class". I'm not sure exactly what the character class [--] matches; I think it should just match any '-' character. In a character class, a '-' is usually used for a range, as in [a-z] but the range [--] makes no sense so I think it would fall back to just matching a '-'. The problem is: because you have the ? after it, it would only match zero or one '-' character, and you need it to be able to match two.

like image 196
steveha Avatar answered May 19 '26 19:05

steveha