I wrote a script in python, and it surprised me. Basically, it takes five 20 digit numbers, multiplies them and then raises them to the power of 3000. The timeit module is used to find the time required to compute the calculation. Well, when I run this script, it says it took 3*10^-7 seconds to compute it. It then generates a file, output.txt, but the script doesn't end until about 15 seconds later.
import timeit
outputFile = open("output.txt", "w")
start = timeit.default_timer()
x = (87459837581209463928*23745987364728194857*27385647593847564738*10293769154925693856*12345678901234567891)**3000
stop = timeit.default_timer()
time = stop-start
print "Time taken for the calculation was {} seconds".format(time)
outputFile.writelines(str(x))
outputFile.close()
y = raw_input("Press enter to exit.")
Does this mean that it actually takes a longer time to print a 280kb file than to perform the calculation?(I find it unlikely.)
If not so, then does python execute the calculation when the variable x is called upon? Will it execute the calculation every time the variable is calculated, or will it store the actual value in the variable?
I have just written another script, which confirms that it takes python 0.03 seconds to write the result to a .txt file. So, why does python execute the calculations later?
So, why does python execute the calculations later? I/O operations (like printing) are generally slower than most of calculations. Especially when what you're printing is a number about 300,000 digits long.
It turns out that having a print statement for every line in a file with 4 million lines is increasing the time way too much. It will be slower as you are having to perform a large number of prints, any extra processing is going to incur some performance penalty.
It's not the calculation that's the problem, nor is it writing to file: the vast bulk of the time is consumed by converting the result from its internal binary representation to a base-10 representation. That takes time quadratic in the number of bits, and you have a lot of bits here.
If you replace your output line with:
outputFile.writelines(hex(x))
you'll see that it runs very much faster. Converting to a hex representation only takes time linear in the number of bits.
If you really need to output giant integers in base 10 representation, look into using the decimal
module instead. That does computations internally in a representation related to base 10, and then conversion to a decimal string takes time linear in the number of decimal digits. You'll need to set the decimal context's precision to a "big enough" value in advance, though, to avoid losing lower-order digits to rounding.
It is conversion to string that takes so long:
In [68]: %time x = (87459837581209463928*23745987364728194857*27385647593847564738*10293769154925693856*12345678901234567891)**3000
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s
In [69]: %time xs = str(x)
CPU times: user 1.98 s, sys: 0.00 s, total: 1.98 s
Wall time: 1.98 s
In [71]: %time print xs
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.04 s
But it should not be surprising with number that has hundreds of thousands digits.
EDIT
Contrary to other answers, writing to file does not take that much time:
In [72]: %time with open('tmp.file', 'w') as f: f.write(xs)
CPU times: user 0.00 s, sys: 0.01 s, total: 0.01 s
Wall time: 0.00 s
In addition to other answers, use outputFile.write(str(x))
instead of writelines
. writelines is meant to be used with a sequence of strings. In your case, it iterates the string and writes each character individually. In a simple test, writelines was 3.7 times slower:
>>> timeit("f.writelines(str(s))", setup="f=open('tmp.txt','w');s=range(1000)", number=10000)
4.935087700632465
>>> timeit("f.write(str(s))", setup="f=open('tmp.txt','w');s=range(1000)", number=10000)
1.3468097837871085
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With