Opening a 25GB text file for processing

Question

I have a 25GB file I need to process. Here is what I'm currently doing, but it takes an extremely long time to open:

collection_pricing = os.path.join(pricing_directory, 'collection_price')
with open(collection_pricing, 'r') as f:
    collection_contents = f.readlines()

length_of_file = len(collection_contents)

for num, line in enumerate(collection_contents):
    print '%s / %s' % (num+1, length_of_file)
    cursor.execute(...)

How could I improve this?

nos · Accepted Answer

Unless the lines in your file is really, really big, do not print the progress at every line. Printing to a terminal is very slow. Print progress e.g. every 100 or every 1000 lines.
Use the available operating system facilities to get the size of a file - os.path.getsize() , see Getting file size in Python?
Get rid of readlines() to avoid reading 25GB into memory. Instead read and process line by line, see e.g. How to read large file, line by line in python

Opening a 25GB text file for processing

Tags:

performance

python

David542

1 Answers

nos

Recent Activity

Donate For Us

Opening a 25GB text file for processing

Tags:

performance

python

David542

1 Answers

nos

Related questions

Recent Activity

Donate For Us