I have a 30MB .txt file, with one line of data (30 Million Digit Number)
Unfortunately, every method I've tried (mmap.read()
, readline()
, allocating 1GB of RAM, for loops) takes 45+ minutes to completely read the file.
Every method I found on the internet seems to work on the fact that each line is small, therefore the memory consumption is only as big as the biggest line in the file. Here's the code I've been using.
start = time.clock()
z = open('Number.txt','r+')
m = mmap.mmap(z.fileno(), 0)
global a
a = int(m.read())
z.close()
end = time.clock()
secs = (end - start)
print("Number read in","%s" % (secs),"seconds.", file=f)
print("Number read in","%s" % (secs),"seconds.")
f.flush()
del end,start,secs,z,m
Other than splitting the number from one line to various lines; which I'd rather not do, is there a cleaner method which won't require the better part of an hour?
By the way, I don't necessarily have to use text files.
I have: Windows 8.1 64-Bit, 16GB RAM, Python 3.5.1
The best way to view extremely large text files is to use… a text editor. Not just any text editor, but the tools meant for writing code. Such apps can usually handle large files without a hitch and are free. Large Text File Viewer is probably the simplest of these applications.
The readline() method helps to read just one line at a time, and it returns the first line from the file given. We will make use of readline() to read all the lines from the file given. To read all the lines from a given file, you can make use of Python readlines() function.
The file read is quick (<1s):
with open('number.txt') as f:
data = f.read()
Converting a 30-million-digit string to an integer, that's slow:
z=int(data) # still waiting...
If you store the number as raw big- or little-endian binary data, then int.from_bytes(data,'big')
is much quicker.
If I did my math right (Note _
means "last line's answer" in Python's interactive interpreter):
>>> import math
>>> math.log(10)/math.log(2) # Number of bits to represent a base 10 digit.
3.3219280948873626
>>> 30000000*_ # Number of bits to represent 30M-digit #.
99657842.84662087
>>> _/8 # Number of bytes to represent 30M-digit #.
12457230.35582761 # Only ~12MB so file will be smaller :^)
>>> import os
>>> data=os.urandom(12457231) # Generate some random bytes
>>> z=int.from_bytes(data,'big') # Convert to integer (<1s)
99657848
>>> math.log10(z) # number of base-10 digits in number.
30000001.50818886
EDIT: FYI, my math wasn't right, but I fixed it. Thanks for 10 upvotes without noticing :^)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With