Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a file (or stdin) line by line in Python not waiting for reading entire file

Tags:

python

filter

We have couple of huge files (greater than size of RAM) in disk. I want to read them line by line in python and output results in terminal. I have gone through [1] and [2], but I am looking for methods which do not wait till the entire file is read into memory.

I would be using both of these commands:

cat fileName | python myScript1.py
python myScript2.py fileName

[1] How do you read from stdin in Python? [2] How do I write a unix filter in python?

like image 617
BiGYaN Avatar asked Oct 17 '11 09:10

BiGYaN


2 Answers

This is the standard behavior of file objects in Python:

with open("myfile.txt", "r") as myfile:
    for line in myfile:
        # do something with the current line

or

for line in sys.stdin:
    # do something with the current line
like image 50
Tim Pietzcker Avatar answered Oct 05 '22 02:10

Tim Pietzcker


Just iterate over the file:

with open('huge.file') as hf:
  for line in hf:
    if 'important' in line:
      print(line)

This will require O(1) memory.

To read from stdin, simply iterate over sys.stdin instead of hf:

import sys
for line in sys.stdin:
  if 'important' in line:
    print(line)
like image 20
phihag Avatar answered Oct 05 '22 02:10

phihag