Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a 100GB one-line text file in Python?

I'm on Windows platform and using Python 3. Since the default behavior of file readers is to consume file line by line, I have difficulty dealing with my 100GB text file which has only one line.

I'm aware of solutions such as this for introducing a custom record separator for replacing a frequent character with \n; but I wonder is there anyway that I could consume and process my file only via Python?

I have only 8GB of ram. My file is the records of sales (including item, price, buyer, ...). My processing of the file is mostly editing price numbers. Records are separated from each other using | character.

like image 231
wiki Avatar asked Mar 03 '23 13:03

wiki


1 Answers

# !/usr/bin/python3
import os, sys

# Open a file
fd = os.open("foo.txt",os.O_RDWR)

# Reading text
ret = os.read(fd,12)
print (ret.decode())

# Close opened file
os.close(fd)
print ("Closed the file successfully!!")

or

with open(filename, 'rb') as f:
    while True:
        buf = f.read(max_size)
        if not buf:
            break
        process(buf)

or

from functools import partial

with open('somefile', 'rb') as openfileobject:
    for chunk in iter(partial(openfileobject.read, 1024), b''):
        do_something()
like image 158
kgr Avatar answered Mar 16 '23 15:03

kgr