Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to grab the lines AFTER a matched line in python

I am an amateur using Python on and off for some time now. Sorry if this is a silly question, but I was wondering if anyone knew an easy way to grab a bunch of lines if the format in the input file is like this:

" Heading 1

Line 1

Line 2

Line 3

Heading 2

Line 1

Line 2

Line 3 "

I won't know how many lines are after each heading, but I want to grab them all. All I know is the name, or a regular expression pattern for the heading.

The only way I know to read a file is the "for line in file:" way, but I don't know how to grab the lines AFTER the line I'm currently on. Hope this makes sense, and thanks for the help!

*Thanks for all the responses! I have tried to implement some of the solutions, but my problem is that not all the headings are the same name, and I'm not sure how to work around it. I need a different regular expression for each... any suggestions?*

like image 626
toofly Avatar asked Jan 04 '11 15:01

toofly


People also ask

How do you proceed to the next line in Python?

The new line character in Python is \n .

How do you print a line after a string in Python?

Use the addition operator to print a new line after a variable, e.g. print(variable + '\n') . The newline ( \n ) character is a special character in python and is used to insert new lines in a string. Copied!

How do you separate lines in Python?

You cannot split a statement into multiple lines in Python by pressing Enter . Instead, use the backslash ( \ ) to indicate that a statement is continued on the next line. In the revised version of the script, a blank space and an underscore indicate that the statement that was started on line 1 is continued on line 2.

How do you print the nth line in Python?

Print input DataFrame, df. Initialize a variable nth_row. Use iloc() method to get nth row. Print the returned DataFrame.


2 Answers

Generator Functions

def group_by_heading( some_source ):
    buffer= []
    for line in some_source:
        if line.startswith( "Heading" ):
            if buffer: yield buffer
            buffer= [ line ]
        else:
            buffer.append( line )
    yield buffer

with open( "some_file", "r" ) as source:
    for heading_and_lines in group_by_heading( source ):
        heading= heading_and_lines[0]
        lines= heading_and_lines[1:]
        # process away.
like image 181
S.Lott Avatar answered Nov 14 '22 23:11

S.Lott


You could use a variable to mark where which heading you are currently tracking, and if it is set, grab every line until you find another heading:

data = {}
for line in file:
    line = line.strip()
    if not line: continue

    if line.startswith('Heading '):
        if line not in data: data[line] = []
        heading = line
        continue

    data[heading].append(line)

Here's a http://codepad.org snippet that shows how it works: http://codepad.org/KA8zGS9E

Edit: If you don't care about the actual heading values and just want a list at the end, you can use this:

data = []
for line in file:
    line = line.strip()
    if not line: continue

    if line.startswith('Heading '):
        continue

    data.append(line)

Basically, you don't really need to track a variable for the heading, instead you can just filter out all lines that match the Heading pattern.

like image 24
Alex Vidal Avatar answered Nov 14 '22 23:11

Alex Vidal