Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex to match multiple times, store results separately

Tags:

python

regex

I'm a network engineer, trying to dip my toes into programming. I got recommended to try Python.

What I'm trying to do is to save some specific data, matching a string with multiple lines with regexp. We got our data to work with stored in SourceData.

SourceData = '
ip route 22.22.22.22 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 33.33.33.33 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 11.22.33.44 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 11.11.12.11 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 11.11.13.11 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 11.11.14.0 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 44.44.44.0 255.255.255.0 TenGigabitEthernet0/1/0 1.1.1.1'

The number of lines stored in SourceData is always unknown. Could be 0 lines (empty) to unlimited lines.

I want to match all lines containing ipv4-addresses starting with 11.

This is what I've come up with as a start:

ip1 = re.search('11\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}', SourceData)
        if ip1:
            ip1 = ip1.group()

Verify:

>>> print ip1
11.22.33.44

OK, seems to work. The idea is that when the whole SourceData is matched, with the example provided, the final result for this case would be 4 matches:

ip1 = 11.22.33.44
ip2 = 11.11.12.11
ip3 = 11.11.13.11
ip4 = 11.11.14.0

Next to learn, how do I continue to check SourceData for more matches as described above, and how do I store the multiple matches for use later on in the code? For example, later in the code I would like to use the value from a specific match, lets say match number 4 (11.11.14.0).

I have read some guidelines for Python and Regex, but it seems I quite don't understand it :)

like image 796
w00tw00t Avatar asked Nov 21 '17 19:11

w00tw00t


People also ask

What is the correct regex in Python method used to search for all occurrences that match a given pattern?

findall() module is used to search for “all” occurrences that match a given pattern.

How do you find multiple occurrences of a string in regex?

Method 1: Regex re. To get all occurrences of a pattern in a given string, you can use the regular expression method re. finditer(pattern, string) . The result is an iterable of match objects—you can retrieve the indices of the match using the match. start() and match.

Does re search return multiple matches?

The re.search() returns only the first match to the pattern from the target string. Use a re.search() to search pattern anywhere in the string.

Does * match everything in regex?

Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.


2 Answers

You can use re.findall to return all of the matches

>>> re.findall(r'11\.\d{1,3}\.\d{1,3}\.\d{1,3}', SourceData)
['11.22.33.44', '11.11.12.11', '11.11.13.11', '11.11.14.0']
like image 82
Cory Kramer Avatar answered Sep 28 '22 05:09

Cory Kramer


Several methods, one of them being:

import re

string = """
ip route 22.22.22.22 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 33.33.33.33 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 11.22.33.44 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 11.11.12.11 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 11.11.13.11 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 11.11.14.0 255.255.255.255 TenGigabitEthernet0/1/0 1.1.1.1
ip route 44.44.44.0 255.255.255.0 TenGigabitEthernet0/1/0 1.1.1.1'
"""

rx = re.compile(r'^[^\d\n]*(11(?:\.\d+){3})', re.M)

lines = [match.group(1) for match in rx.finditer(string)]
print(lines)    

This yields:

['11.22.33.44', '11.11.12.11', '11.11.13.11', '11.11.14.0']


The core here is
^            # match start of the line
[^\d\n]*     # NOT a digit or a newline, 0+ times
11           # 11
(?:\.\d+){3} # .0-9 three times
.+           # rest of the line

The rest is done via re.finditer() and a list comprehension.
See a demo on regex101.com.

like image 35
Jan Avatar answered Sep 28 '22 05:09

Jan