Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Print a number of previous lines after matching string found in line in python

I'm writing a program to parse through some log files. If an error code is in the line, I need to print the previous 25 lines for analysis. I'd like to be able to repeat this concept with more or less lines depending on the individual error code (instead of 25 lines, 15 or 35).

with open(file, 'r') as input:
     for line in input:
         if "error code" in line: 
             #print previous 25 lines

I know the equivalent command in Bash for what I need is grep "error code" -B 25 Filename | wc -1. I'm still new to python and programming in general, I know I'm going to need a for loop and I've tried using the range function to do this but I haven't had much luck because I don't know how to implement the range into files.`

like image 485
pjano1 Avatar asked Jan 27 '23 13:01

pjano1


2 Answers

This is a perfect use case for a length limited collections.deque:

from collections import deque

line_history = deque(maxlen=25)
with open(file) as input:
    for line in input:
        if "error code" in line: 
            print(*line_history, line, sep='')
            # Clear history so if two errors seen in close proximity, we don't
            # echo some lines twice
            line_history.clear()
        else:
            # When deque reaches 25 lines, will automatically evict oldest
            line_history.append(line)

Complete explanation of why I chose this approach (skip if you don't really care):

This isn't solvable in a good/safe way using for/range, because indexing only makes sense if you load the whole file into memory; the file on disk has no idea where lines begin and end, so you can't just ask for "line #357 of the file" without reading it from the beginning to find lines 1 through 356. You'd either end up repeatedly rereading the file, or slurping the whole file into an in-memory sequence (e.g. list/tuple) to have indexing make sense.

For a log file, you have to assume it could be quite large (I regularly deal with multi-gigabyte log files), to the point where loading it into memory would exhaust main memory, so slurping is a bad idea, and rereading the file from scratch each time you hit an error is almost as bad (it's slow, but it's reliably slow I guess?). The deque based approach means your peak memory usage is based on the 27 longest lines in the file, rather than the total file size.

A naïve solution with nothing but built-ins could be as simple as:

with open(file) as input:
    lines = tuple(input)  # Slurps all lines from file
for i, line in enumerate(lines):
    if "error code" in line:
        print(*lines[max(i-25, 0):i], line, sep='')

but like I said, this requires enough memory to hold your entire log file in memory at once, which is a bad thing to count on. It also repeats lines when two errors occur in close proximity, because unlike deque, you don't get an easy way to empty your recent memory; you'd have to manually track the index of the last print to restrict your slice.

Note that even then, I didn't use range; range is a crutch a lot of people coming from C backgrounds rely on, but it's usually the wrong way to solve a problem in Python. In cases where an index is needed (it usually isn't), you usually need the value too, so enumerate based solutions are superior; most of the time, you don't need an index at all, so direct iteration (or paired iteration with zip or the like) is the correct solution.

like image 67
ShadowRanger Avatar answered Jan 30 '23 03:01

ShadowRanger


Try base coding with for loop and range function without any special libraries:

N = 25
with open(file, 'r') as f:
    lines = f.read().splitlines()
    for i, line in enumerate(lines):
        if "error code" in line: 
            j = i-N if i>N else 0
            for k in range(j,i):
                print(lines[k])

Above prints previous 25 lines or from first line if total lines are less than 25.

Also, it is better to avoid using input as a variable term since it is a keyword in Python.

like image 25
rnso Avatar answered Jan 30 '23 02:01

rnso