I have a Python script that process a huge text file (with around 4 millon lines) and writes the data into two separate files.
I have added a print statement, which outputs a string for every line for debugging. I want to know how bad it could be from the performance perspective?
If it is going to very bad, I can remove the debugging line.
Edit
It turns out that having a print statement for every line in a file with 4 million lines is increasing the time way too much.
Tried doing it in a very simple script just for fun, the difference is quite staggering:
In large.py:
target = open('target.txt', 'w')
for item in xrange(4000000):
target.write(str(item)+'\n')
print item
Timing it:
[gp@imdev1 /tmp]$ time python large.py
real 1m51.690s
user 0m10.531s
sys 0m6.129s
gp@imdev1 /tmp]$ ls -lah target.txt
-rw-rw-r--. 1 gp gp 30M Nov 8 16:06 target.txt
Now running the same with "print" commented out:
gp@imdev1 /tmp]$ time python large.py
real 0m2.584s
user 0m2.536s
sys 0m0.040s
Yes it affects performance. I wrote a small program to demonstrate-
import time
start_time=time.time()
for i in range(100):
for j in range(100):
for k in range(100):
print(i,j,k)
print(time.time()-start_time)
input()
The time measured was-160.2812204496765 Then I replaced the print statement by pass. The results were shocking. The measured time without print was- 0.26517701148986816.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With