Python Does Not Read Entire Text File

Tags:

I'm running into a problem that I haven't seen anyone on StackOverflow encounter or even google for that matter.

My main goal is to be able to replace occurences of a string in the file with another string. Is there a way there a way to be able to acess all of the lines in the file.

The problem is that when I try to read in a large text file (1-2 gb) of text, python only reads a subset of it.

For example, I'll do a really simply command such as:

Click to copy

newfile = open("newfile.txt","w")
f = open("filename.txt","r")
for line in f:
    replaced = line.replace("string1", "string2")
    newfile.write(replaced)

And it only writes the first 382 mb of the original file. Has anyone encountered this problem previously?

I tried a few different solutions such as using:

Click to copy

import fileinput
for i, line in enumerate(fileinput.input("filename.txt", inplace=1)
   sys.stdout.write(line.replace("string1", "string2")

But it has the same effect. Nor does reading the file in chunks such as using

Click to copy

f.read(10000)

I've narrowed it down to mostly likely being a reading in problem and not a writing problem because it happens for simply printing out lines. I know that there are more lines. When I open it in a full text editor such as Vim, I can see what the last line should be, and it is not the last line that python prints.

Can anyone offer any advice or things to try?

I'm currently using a 32-bit version of Windows XP with 3.25 gb of ram, and running Python 2.7

*Edit Solution Found (Thanks Lattyware). Using an Iterator

Click to copy

def read_in_chunks(file, chunk_size=1000): 
   while True: 
      data = file.read(chunk_size) 
      if not data: break 
      yield data

634

asked Mar 28 '12 10:03

user1297872

3 Answers

Try:

Click to copy

f = open("filename.txt", "rb")

On Windows, rb means open file in binary mode. According to the docs, text mode vs. binary mode only has an impact on end-of-line characters. But (if I remember correctly) I believe opening files in text mode on Windows also does something with EOF (hex 1A).

You can also specify the mode when using fileinput:

Click to copy

fileinput.input("filename.txt", inplace=1, mode="rb")

129

answered Oct 16 '22 16:10

codeape

Are you sure the problem is with reading and not with writing out? Do you close the file that is written to, either explicitly newfile.close() or using the with construct?

Not closing the output file is often the source of such problems when buffering is going on somewhere. If that's the case in your setting too, closing should fix your initial solutions.

answered Oct 16 '22 17:10

benroth

If you use the file like this:

Click to copy

with open("filename.txt") as f:
    for line in f:
        newfile.write(line.replace("string1", "string2"))

It should only read into memory one line at a time, unless you keep a reference to that line in memory.
After each line is read it will be up to pythons garbage collector to get rid of it. Give this a try and see if it works for you :)

answered Oct 16 '22 17:10

Serdalis

Related questions
                            
                                Parsing CSV into Pytorch tensors
                            
                                SQLAlchemy "AttributeError: 'str' object has no attribute 'c'"
                            
                                Stop Training in Keras when Accuracy is already 1.0
                            
                                Django: how save bytes object to models.FileField?
                            
                                str.contains pandas returns 'str' object has no attribute 'contains'
                            
                                Plotly express vs. Altair/Vega-Lite for interactive plots
                            
                                poetry Virtual environment already activated
                            
                                AttributeError: module 'networkx' has no attribute 'connected_component_subgraphs'
                            
                                How to Trigger a DAG on the success of a another DAG in Airflow using Python?
                            
                                Why doesn't Python 2.6 have set literals and comprehensions or dict comprehensions? [closed]
                            
                                How can I check Hamming Weight without converting to binary?
                            
                                Print extremely large long in scientific notation in python
                            
                                Screen scraping with Python
                            
                                Can I change an an existing virtualenv to ignore global site packages? (like --no-site-package on a new one)
                            
                                Get actual disk space of a file
                            
                                Access to the values of set()
                            
                                Numpy - square root of -1 leaves a small real part
                            
                                Creating a dict from list of key, value tuples while maintaining duplicate keys
                            
                                Matplotlib: Draw lines from x axis to points
                            
                                plotting markers on top of axes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Does Not Read Entire Text File

Tags:

python

text

file-io

filesize

user1297872

People also ask

3 Answers

codeape

benroth

Serdalis

Recent Activity

Donate For Us