Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python read multiline into a single line by reading a file line by line

Tags:

python

I want to get the following files from

mwe.log

07:23:07.754 A
07:23:07.759 B  
C
D
E
07:23:07.770 I
07:23:07.770 II
07:23:07.770 III

I would expect

07:23:07.754 A
07:23:07.759 B C D E
07:23:07.770 I
07:23:07.770 II
07:23:07.770 III

by executing this code

import re

input_file = "mwe.log"


def read_logfile(full_file, start):
    result_intermediate_line = ''
    with open(input_file, 'r') as fin:
        for _raw_line in fin:
            log_line = _raw_line.rstrip()
            #result = ''
            if start.match(log_line):
                if len(result_intermediate_line) > 0:
                    result = result_intermediate_line
                else:
                    result = log_line
            else:
                result = result_intermediate_line + log_line

            yield result


if __name__ == "__main__":
    number_line = re.compile(r'^\d+\:\d+\:\d+\.\d+\s+')
    for line in read_logfile(input_file, number_line):
        print(line)

Should be used by python 3.7 and above. So my issue is that I would like to have each line with a timestamp like shown above so that I can postprocessing a single line. So it could be seen as an converter from a format 1 to a format 2.

Do you have any idea where I got the bug in?

like image 710
Peter Ebelsberger Avatar asked Jan 21 '26 22:01

Peter Ebelsberger


1 Answers

This should work:

import re

input_file = "mwe.log"


def read_logfile(input_file, start):
    with open(input_file, "r") as fin:
        result_intermediate_line = next(fin).rstrip()
        for _raw_line in fin:
            log_line = _raw_line.rstrip()
            if start.match(log_line):
                previous_line = result_intermediate_line
                result_intermediate_line = log_line
                yield previous_line
            else:
                result_intermediate_line += " " + log_line
        yield result_intermediate_line


if __name__ == "__main__":
    number_line = re.compile(r"^\d+\:\d+\:\d+\.\d+\s+")
    for line in read_logfile(input_file, number_line):
        print(line)

The problem is that you were always yielding the line, instead I only yield if the new line has a timestamp at the start, otherwise i append the line to the previous one.

like image 127
Matteo Zanoni Avatar answered Jan 23 '26 13:01

Matteo Zanoni



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!