Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Opening and changing large text files

I have a ~600MB Roblox type .mesh file, which reads like a text file in any text editor. I have the following code below:

mesh = open("file.mesh", "r").read()
mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{")
mesh = "{"+mesh+"}"
f = open("p2t.txt", "w")
f.write(mesh)

It returns:

Traceback (most recent call last):
  File "C:\TheDirectoryToMyFile\p2t2.py", line 2, in <module>
    mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{")
MemoryError

Here is a sample of my file:

[-0.00599, 0.001466, 0.006][0.16903, 0.84515, 0.50709][0.00000, 0.00000, 0][-0.00598, 0.001472, 0.00599][0.09943, 0.79220, 0.60211][0.00000, 0.00000, 0]

What can I do?

Edit:

I'm not sure what the head, follow, and tail commands are in that other thread that marked this as a duplicate. I tried to use it, but couldn't get it to work. The file is also one giant line, it isn't split into lines.

like image 373
GShocked Avatar asked Jun 22 '15 03:06

GShocked


People also ask

How do I open and read a large text file in Python?

To read large text files in Python, we can use the file object as an iterator to iterate over the file and perform the required task. Since the iterator just iterates over the entire file and does not require any additional data structure for data storage, the memory consumed is less comparatively.

How do I read a long text file in Python?

To read a text file in Python, you follow these steps: First, open a text file for reading by using the open() function. Second, read text from the text file using the file read() , readline() , or readlines() method of the file object. Third, close the file using the file close() method.


2 Answers

You need to read one bite per iteration, analyze it and then write to another file or to sys.stdout. Try this code:

mesh = open("file.mesh", "r")
mesh_out = open("file-1.mesh", "w")

c = mesh.read(1)

if c:
    mesh_out.write("{")
else:
    exit(0)
while True:
    c = mesh.read(1)
    if c == "":
        break

    if c == "[":
        mesh_out.write(",{")
    elif c == "]":
        mesh_out.write("}")
    else:
        mesh_out.write©

UPD:

It works really slow (thanks to jamylak). So I've changed it:

import sys
import re


def process_char(c, stream, is_first=False):
    if c == '':
        return False
    if c == '[':
        stream.write('{' if is_first else ',{')
        return True
    if c == ']':
        stream.write('}')
        return True


def process_file(fname):
    with open(fname, "r") as mesh:
        c = mesh.read(1)
        if c == '':
            return
        sys.stdout.write('{')

        while True:
            c = mesh.read(8192)
            if c == '':
                return

            c = re.sub(r'\[', ',{', c)
            c = re.sub(r'\]', '}', c)
            sys.stdout.write(c)


if __name__ == '__main__':
    process_file(sys.argv[1])

So now it's working ~15 sec on 1.4G file. To run it:

$ python mesh.py file.mesh > file-1.mesh
like image 193
Pavel Reznikov Avatar answered Sep 22 '22 03:09

Pavel Reznikov


You could do it line by line:

mesh = open("file.mesh", "r")
with open("p2t.txt", "w") as f:
   for line in mesh:
      line= line.replace("[", "{").replace("]", "}").replace("}{", "},{")
      line = "{"+line +"}"
      f.write(line)
like image 25
maxymoo Avatar answered Sep 25 '22 03:09

maxymoo