I have a very long string, almost a megabyte long, that I need to write to a text file. The regular
file = open("file.txt","w")
file.write(string)
file.close()
works but is too slow, is there a way I can write faster?
I am trying to write a several million digit number to a text file
the number is on the order of math.factorial(67867957)
This is what shows on profiling:
203 function calls (198 primitive calls) in 0.001 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 re.py:217(compile)
1 0.000 0.000 0.000 0.000 re.py:273(_compile)
1 0.000 0.000 0.000 0.000 sre_compile.py:172(_compile_charset)
1 0.000 0.000 0.000 0.000 sre_compile.py:201(_optimize_charset)
4 0.000 0.000 0.000 0.000 sre_compile.py:25(_identityfunction)
3/1 0.000 0.000 0.000 0.000 sre_compile.py:33(_compile)
1 0.000 0.000 0.000 0.000 sre_compile.py:341(_compile_info)
2 0.000 0.000 0.000 0.000 sre_compile.py:442(isstring)
1 0.000 0.000 0.000 0.000 sre_compile.py:445(_code)
1 0.000 0.000 0.000 0.000 sre_compile.py:460(compile)
5 0.000 0.000 0.000 0.000 sre_parse.py:126(__len__)
12 0.000 0.000 0.000 0.000 sre_parse.py:130(__getitem__)
7 0.000 0.000 0.000 0.000 sre_parse.py:138(append)
3/1 0.000 0.000 0.000 0.000 sre_parse.py:140(getwidth)
1 0.000 0.000 0.000 0.000 sre_parse.py:178(__init__)
10 0.000 0.000 0.000 0.000 sre_parse.py:183(__next)
2 0.000 0.000 0.000 0.000 sre_parse.py:202(match)
8 0.000 0.000 0.000 0.000 sre_parse.py:208(get)
1 0.000 0.000 0.000 0.000 sre_parse.py:351(_parse_sub)
2 0.000 0.000 0.000 0.000 sre_parse.py:429(_parse)
1 0.000 0.000 0.000 0.000 sre_parse.py:67(__init__)
1 0.000 0.000 0.000 0.000 sre_parse.py:726(fix_flags)
1 0.000 0.000 0.000 0.000 sre_parse.py:738(parse)
3 0.000 0.000 0.000 0.000 sre_parse.py:90(__init__)
1 0.000 0.000 0.000 0.000 {built-in method compile}
1 0.001 0.001 0.001 0.001 {built-in method exec}
17 0.000 0.000 0.000 0.000 {built-in method isinstance}
39/38 0.000 0.000 0.000 0.000 {built-in method len}
2 0.000 0.000 0.000 0.000 {built-in method max}
8 0.000 0.000 0.000 0.000 {built-in method min}
6 0.000 0.000 0.000 0.000 {built-in method ord}
48 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5 0.000 0.000 0.000 0.000 {method 'find' of 'bytearray' objects}
1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
Your issue is that str(long) is very slow for large intergers (millions of digits) in Python. It is a quadratic operation (in number of digits) in Python i.e., for ~1e8 digits it may require ~1e16 operations to convert the integer to a decimal string.
Writing to a file 500MB should not take hours e.g.:
$ python3 -c 'open("file", "w").write("a"*500*1000000)'
returns almost immediately. ls -l file confirms that the file is created and it has the expected size.
Calculating math.factorial(67867957) (the result has ~500M digits) may take several hours but saving it using pickle is instantaneous:
import math
import pickle
n = math.factorial(67867957) # takes a long time
with open("file.pickle", "wb") as file:
pickle.dump(n, file) # very fast (comparatively)
To load it back using n = pickle.load(open('file.pickle', 'rb')) takes less than a second.
str(n) is still running (after 50 hours) on my machine.
To get the decimal representation fast, you could use gmpy2:
$ python -c'import gmpy2;open("file.gmpy2", "w").write(str(gmpy2.fac(67867957)))'
It takes less than 10 minutes on my machine.
ok this is really not an answer it is more to prove your reasoning for the delay wrong
first test write speed of a big string
import timeit
def write_big_str(n_bytes=1000000):
with open("test_file.txt","wb") as f:
f.write("a"*n_bytes)
print timeit.timeit("write_big_str()","from __main__ import write_big_str",number=100)
you should see a fairly respectable speed (and thats to repeat it 100 times)
next we will see how long it takes to convert a very big number to a str
import timeit,math
n = math.factorial(200000)
print timeit.timeit("str(n)","from __main__ import n",number=1)
it will probably take ~10seconds (and that is a million digit number) , which granted is slow ... but not hours slow (ok its pretty slow to convert to string :P... but still shouldnt take hours) (well it took more like 243 seconds for my box i guess :P)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With