Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to only read lines in a text file after a certain string?

I'd like to read to a dictionary all of the lines in a text file that come after a particular string. I'd like to do this over thousands of text files.

I'm able to identify and print out the particular string ('Abstract') using the following code (gotten from this answer):

for files in filepath:
    with open(files, 'r') as f:
        for line in f:
            if 'Abstract' in line:
                print line;

But how do I tell Python to start reading the lines that only come after the string?

like image 934
Brian Zelip Avatar asked Jan 06 '15 19:01

Brian Zelip


People also ask

How do I make the first 10 lines of a file read only?

To look at the first few lines of a file, type head filename, where filename is the name of the file you want to look at, and then press <Enter>. By default, head shows you the first 10 lines of a file. You can change this by typing head -number filename, where number is the number of lines you want to see.

How do I read a specific part of a file in Python?

Method 1: fileobject.readlines() A file object can be created in Python and then readlines() method can be invoked on this object to read lines into a stream. This method is preferred when a single line or a range of lines from a file needs to be accessed simultaneously.

How do you read multiple lines in a text file in Python?

To read multiple lines, call readline() multiple times. The built-in readline() method return one line at a time. To read multiple lines, call readline() multiple times.


1 Answers

Just start another loop when you reach the line you want to start from:

for files in filepath:
    with open(files, 'r') as f:
        for line in f:
            if 'Abstract' in line:                
                for line in f: # now you are at the lines you want
                    # do work

A file object is its own iterator, so when we reach the line with 'Abstract' in it we continue our iteration from that line until we have consumed the iterator.

A simple example:

gen = (n for n in xrange(8))

for x in gen:
    if x == 3:
        print('Starting second loop')
        for x in gen:
            print('In second loop', x)
    else:
        print('In first loop', x)

Produces:

In first loop 0
In first loop 1
In first loop 2
Starting second loop
In second loop 4
In second loop 5
In second loop 6
In second loop 7

You can also use itertools.dropwhile to consume the lines up to the point you want:

from itertools import dropwhile

for files in filepath:
    with open(files, 'r') as f:
        dropped = dropwhile(lambda _line: 'Abstract' not in _line, f)
        next(dropped, '')
        for line in dropped:
                print(line)
like image 88
Padraic Cunningham Avatar answered Oct 02 '22 17:10

Padraic Cunningham