I want to process every line in my log file, and extract IP
address if line matches my pattern. There are several different types of messages, in example below I am using p1and
p2`.
I could read the file line by line, and for each line match to each pattern. But Since there can be many more patterns, I would like to do it as efficiently as possible. I was hoping to compile thos patterns into one object, and do the match only once for each line:
import re
IP = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
p1 = 'Registration from' + IP + '- Wrong password'
p2 = 'Call from' + IP + 'rejected because extension not found'
c = re.compile(r'(?:' + p1 + '|' + p2 + ')')
for line in sys.stdin:
match = re.search(c, line)
if match:
print(match['ip'])
but the above code does not work, it complains that ip
is used twice.
What is the most elegant way to achieve my goal ?
EDIT:
I have modified my code based on answer from @Dev Khadka.
But I am still struggling with how to properly handle the multiple ip
matches. The code below prints all IPs that matched p1:
for line in sys.stdin:
match = c.search(line)
if match:
print(match['ip1'])
But some lines don't match p1
. They match p2
. ie, I get:
1.2.3.4
None
2.3.4.5
...
How do I print the matching ip, when I don't know wheter it was p1
, p2
, ... ? All I want is the IP. I don't care which pattern it matched.
You can consider installing the excellent regex
module, which supports many advanced regex features, including branch reset groups, designed to solve exactly the problem you outlined in this question. Branch reset groups are denoted by (?|...)
. All capture groups of the same positions or names in different alternative patterns within a branch reset grouop share the same capture groups for output.
Notice that in the example below the matching capture group becomes the named capture group, so that you don't need to iterate over multiple groups searching for a non-empty group:
import regex
ip_pattern = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = [
'Registration from {ip} - Wrong password',
'Call from {ip} rejected because extension not found'
]
pattern = regex.compile('(?|%s)' % '|'.join(patterns).format(ip=ip_pattern))
for line in sys.stdin:
match = regex.search(pattern, line)
if match:
print(match['ip'])
Demo: https://repl.it/@blhsing/RegularEmbellishedBugs
why don't you check which regex matched?
if 'ip1' in match :
print match['ip1']
if 'ip2' in match :
print match['ip2']
or something like:
names = [ 'ip1', 'ip2', 'ip3' ]
for n in names :
if n in match :
print match[n]
or even
num = 1000 # can easily handle millions of patterns =)
for i in range(num) :
name = 'ip%d' % i
if name in match :
print match[name]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With