Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: how to Parse and check the time?

How do I extract the IP address that occurs 10 times within a one-second time interval?

In the following case:

241.7118.197.10

28.252.8

like image 397
Maria Avatar asked Dec 24 '16 00:12

Maria


1 Answers

You could collect the data to dict where IP is key and value contains timestamps for given IP. Then every time when timestamp is added you could check if given IP has three timestamps within a second:

from datetime import datetime, timedelta
from collections import defaultdict, deque
import re

THRESHOLD = timedelta(seconds=1)
COUNT = 3

res = set()
d = defaultdict(deque)

with open('test.txt') as f:
    for line in f:
        # Capture IP and timestamp
        m = re.match(r'(\S*)[^\[]*\[(\S*)', line)
        ip, dt = m.groups()

        # Parse timestamp
        dt = datetime.strptime(dt, '%d/%b/%Y:%H:%M:%S:%f')

        # Remove timestamps from deque if they are older than threshold
        que = d[ip]
        while que and (dt - que[0]) > THRESHOLD:
            que.popleft()

        # Add timestamp, update result if there's 3 or more items
        que.append(dt)
        if len(que) >= COUNT:
            res.add(ip)

print(res)

Result:

{'28.252.89.140'}

Above reads the logfile containing the log line by line. For every line a regular expression is used to capture data in two groups: IP and timestamp. Then strptime is used to parse the time.

First group (\S*) captures everything but whitespace. Then [^\[]* captures everything except [ and \[ captures the final character before timestamp. Finally (\S*) is used again to capture everything until next whitespace. See example on regex101.

Once we have IP and time they are added to defaultdict where IP is used as key and value is deque of timestamps. Before new timestamp is added the old ones are removed if they are older than THRESHOLD. This assumes that log lines are already sorted by time. After the addition the length is checked and if there are COUNT or more items in the queue IP is added to result set.

like image 142
niemmi Avatar answered Oct 07 '22 05:10

niemmi