Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read specific part of large file in Python

Tags:

python

parsing

Given a large file (hundreds of MB) how would I use Python to quickly read the content between a specific start and end index within the file?

Essentially, I'm looking for a more efficient way of doing:

open(filename).read()[start_index:end_index]
like image 416
Cerin Avatar asked Mar 26 '13 18:03

Cerin


People also ask

How do I read a specific part of a file in Python?

Method 1: fileobject.readlines() A file object can be created in Python and then readlines() method can be invoked on this object to read lines into a stream. This method is preferred when a single line or a range of lines from a file needs to be accessed simultaneously.


2 Answers

You can seek into the file the file and then read a certain amount from there. Seek allows you to get to a specific offset within a file, and then you can limit your read to only the number of bytes in that range.

with open(filename) as fin:
    fin.seek(start_index)
    data = fin.read(end_index - start_index)

That will only read that data that you're looking for.

like image 142
Dan Lecocq Avatar answered Sep 23 '22 18:09

Dan Lecocq


This is my solution with variable width encoding. My CSV file contains a dictionary where each row is a new item.

def get_stuff(filename, count, start_index):
    with open(filename, 'r') as infile:
             reader = csv.reader(infile)
             num = 0 
             for idx, row in enumerate(reader):
                 if idx >= start_index-1:
                     if num >= count:
                         return
                 else:
                     yield row 
                     num += 1
like image 37
Will Leeney Avatar answered Sep 23 '22 18:09

Will Leeney