My file contains either 45 hex numbers, separated by whitespaces or 48 hex numbers, separated by whitespaces. I need ALL of those numbers individually and not as a whole. I am currently using a brute force method to get 45 numbers.
pattern = re.compile("([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s")
However, even with this, I still cant figure out how to extract the remaining three numbers in a 48 hex number instance. Could you please help me out with simplifying this problem?
I would avoid solutions like the ones below (haven't tried if it works) as I will have to later split the string for each instance i.e. considering it gives proper output!
(((?:[0-9a-f]{2})\s){48})|(((?:[0-9a-f]{2})\s){45})
Thank you!
When writing long REs, consider using re.VERBOSE
to make them more readable.
pattern = re.compile(r"""
^( [0-9a-fA-F]{2} (?: \s [0-9a-fA-F]{2} ){44}
(?:(?: \s [0-9a-fA-F]{2} ){3} )? )$
""", re.VERBOSE)
Read as: two hex digits, followed by 44 times (space followed by two hex digits), optionally followed by 3 times (space followed by two hex digits).
Test:
>>> pattern.match(" ".join(["0f"] * 44))
>>> pattern.match(" ".join(["0f"] * 45))
<_sre.SRE_Match object at 0x7fd8f27e0738>
>>> pattern.match(" ".join(["0f"] * 46))
>>> pattern.match(" ".join(["0f"] * 47))
>>> pattern.match(" ".join(["0f"] * 48))
<_sre.SRE_Match object at 0x7fd8f27e0990>
>>> pattern.match(" ".join(["0f"] * 49))
Then finally, to retrieve the individual digits, do .group(0).split()
on the match result. That's much easier than writing an RE that puts all the digits into separate groups.
EDIT: alright, here's how to solve the original problem. Just construct the RE dynamically.
chunk = r"""([0-9a-fA-F]{2}\s)"""
pattern = re.compile(chunk * 45 + "(?:" + chunk * 3 + ")?")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With