Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression for a pattern of 45 hex numbers OR 48 hex numbers - Python

Tags:

python

regex

My file contains either 45 hex numbers, separated by whitespaces or 48 hex numbers, separated by whitespaces. I need ALL of those numbers individually and not as a whole. I am currently using a brute force method to get 45 numbers.

pattern = re.compile("([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s([0-9a-f]{2})\s")

However, even with this, I still cant figure out how to extract the remaining three numbers in a 48 hex number instance. Could you please help me out with simplifying this problem?

I would avoid solutions like the ones below (haven't tried if it works) as I will have to later split the string for each instance i.e. considering it gives proper output!

(((?:[0-9a-f]{2})\s){48})|(((?:[0-9a-f]{2})\s){45})

Thank you!

like image 497
Proteen Avatar asked Dec 03 '22 02:12

Proteen


1 Answers

When writing long REs, consider using re.VERBOSE to make them more readable.

pattern = re.compile(r"""
 ^( [0-9a-fA-F]{2} (?: \s [0-9a-fA-F]{2} ){44}
                (?:(?: \s [0-9a-fA-F]{2} ){3} )? )$ 
""", re.VERBOSE)

Read as: two hex digits, followed by 44 times (space followed by two hex digits), optionally followed by 3 times (space followed by two hex digits).

Test:

>>> pattern.match(" ".join(["0f"] * 44))
>>> pattern.match(" ".join(["0f"] * 45))
<_sre.SRE_Match object at 0x7fd8f27e0738>
>>> pattern.match(" ".join(["0f"] * 46))
>>> pattern.match(" ".join(["0f"] * 47))
>>> pattern.match(" ".join(["0f"] * 48))
<_sre.SRE_Match object at 0x7fd8f27e0990>
>>> pattern.match(" ".join(["0f"] * 49))

Then finally, to retrieve the individual digits, do .group(0).split() on the match result. That's much easier than writing an RE that puts all the digits into separate groups.

EDIT: alright, here's how to solve the original problem. Just construct the RE dynamically.

chunk = r"""([0-9a-fA-F]{2}\s)"""
pattern = re.compile(chunk * 45 + "(?:" + chunk * 3 + ")?")
like image 188
Fred Foo Avatar answered May 24 '23 13:05

Fred Foo