We have couple of huge files (greater than size of RAM) in disk. I want to read them line by line in python and output results in terminal. I have gone through [1] and [2], but I am looking for methods which do not wait till the entire file is read into memory.
I would be using both of these commands:
cat fileName | python myScript1.py
python myScript2.py fileName
[1] How do you read from stdin in Python? [2] How do I write a unix filter in python?
This is the standard behavior of file objects in Python:
with open("myfile.txt", "r") as myfile:
for line in myfile:
# do something with the current line
or
for line in sys.stdin:
# do something with the current line
Just iterate over the file:
with open('huge.file') as hf:
for line in hf:
if 'important' in line:
print(line)
This will require O(1) memory.
To read from stdin, simply iterate over sys.stdin
instead of hf
:
import sys
for line in sys.stdin:
if 'important' in line:
print(line)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With