Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex issue with optional substring in between

Tags:

python

regex

Been bashing my head on this since 2 days. I'm trying to match a packet content with regex API:

packet_re = (r'.*RADIUS.*\s*Accounting(\s|-)Request.*(Framed(\s|-)IP(\s|-)Address.*Attribute.*Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))?.*(Username|User-Name)(\s|-)Attribute.*Value:\s*(?P<username>\S+).*')

packet1 = """
IP (tos 0x0, ttl 64, id 35592, offset 0, flags [DF], proto UDP (17), length 213)
    10.10.10.1.41860 > 10.10.10.3.1813: [udp sum ok] RADIUS, length: 185
    Accounting-Request (4), id: 0x0a, Authenticator: 41b3b548c4b7f65fe810544995620308
      Framed-IP-Address Attribute (8), length: 6, Value: 10.10.10.11
        0x0000:  0a0a 0a0b
      User-Name Attribute (1), length: 14, Value: 005056969256
        0x0000:  3030 3530 3536 3936 3932 3536
"""
result = search(packet_re, packet1, DOTALL)

The regex matches, but it fails to capture Framed-IP-Address Attribute, client_ip=10.10.10.11. The thing is Framed-IP-Address Attribute can or cannot come in the packet. Hence the pattern is enclosed in another capture group ending with ? meaning 0 or 1 occurrence.

I should be able to ignore it when it doesn't come. Hence packet content can also be:

packet2 = """
IP (tos 0x0, ttl 64, id 60162, offset 0, flags [DF], proto UDP (17), length 163)
    20.20.20.1.54035 > 20.20.20.2.1813: [udp sum ok] RADIUS, length: 135
    Accounting-Request (4), id: 0x01, Authenticator: 219b694bcff639221fa29940e8d2a4b2
      User-Name Attribute (1), length: 14, Value: 005056962f54
        0x0000:  3030 3530 3536 3936 3266 3534
"""

The regex should ignore Framed-IP-Address in this case. It does ignore but it doesn't capture when it does come.

like image 511
tcpip Avatar asked Sep 16 '25 05:09

tcpip


1 Answers

I suggest using

RADIUS.*?Accounting[\s-]Request(?:.*?(Framed[\s-]IP[\s-]Address.*?Attribute(?:.*?Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))?))?.*User-?[nN]ame[\s-]Attribute.*?Value:\s*(?P<username>\S+)

See the regex demo.

Note I removed .* on both ends of the pattern as you are using re.search that does not require matching at the start of string like re.match, and the MatchData object contains .string property that you can access to obtain the whole input string.

Details

  • RADIUS - a word
  • .*? - any zero or more chars, as few as possible
  • Accounting - a word
  • [\s-] - a whitespace or hyphen
  • Request - a word
  • (?:.*? - start of an optional non-capturing group: any zero or more chars as few as possible, then...
    • (Framed[\s-]IP[\s-]Address.*?Attribute - Group 1: Framed + a whitespace or a hyphen + IP + whitespace/hyphen + Address + any zero or more chars as few as possible + Attribute
      • (?:.*?Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))? - an optional non-capturing group matching any zero or more chars as few as possible + Value: + Group "client_ip": four one or more digit matching patterns separated with a literal dot
    • ) - end of the Group 1
  • )? - end of the outer non-capturing group
  • .* - any zero or more chars as many as possible
  • User-?[nN]ame - Username, UserName or User-name/User-Name
  • [\s-] - whitespace or hyphen
  • Attribute - a word
  • .*? - any zero or more chars as few as possible
  • Value: - a literal string
  • \s* - zero or more whitespaces
  • (?P<username>\S+) - Group "username": one or more non-whitespace chars
like image 64
Wiktor Stribiżew Avatar answered Sep 19 '25 00:09

Wiktor Stribiżew