Reading a file with a specified delimiter for newline

Tags:

I have a file in which lines are separated using a delimeter say .. I want to read this file line by line, where lines should be based on presence of . instead of newline.

One way is:

f = open('file','r')
for line in f.read().strip().split('.'):
   #....do some work
f.close()

But this is not memory efficient if my file is too large. Instead of reading a whole file together I want to read it line by line.

open supports a parameter 'newline' but this parameter only takes None, '', '\n', '\r', and '\r\n' as input as mentioned here.

Is there any way to read files line efficiently but based on a pre-specified delimiter?

364

asked Apr 28 '13 05:04

Abhishek Gupta

2 Answers

You could use a generator:

def myreadlines(f, newline):
  buf = ""
  while True:
    while newline in buf:
      pos = buf.index(newline)
      yield buf[:pos]
      buf = buf[pos + len(newline):]
    chunk = f.read(4096)
    if not chunk:
      yield buf
      break
    buf += chunk

with open('file') as f:
  for line in myreadlines(f, "."):
    print line

143

answered Oct 07 '22 21:10

NPE

Here is a more efficient answer, using FileIO and bytearray that I used for parsing a PDF file -

import io
import re


# the end-of-line chars, separated by a `|` (logical OR)
EOL_REGEX = b'\r\n|\r|\n'  

# the end-of-file char
EOF = b'%%EOF'



def readlines(fio):
    buf = bytearray(4096)
    while True:
        fio.readinto(buf)
        try:
            yield buf[: buf.index(EOF)]
        except ValueError:
            pass
        else:
            break
        for line in re.split(EOL_REGEX, buf):
            yield line


with io.FileIO("test.pdf") as fio:
    for line in readlines(fio):
        ...

The above example also handles a custom EOF. If you don't want that, use this:

import io
import os
import re


# the end-of-line chars, separated by a `|` (logical OR)
EOL_REGEX = b'\r\n|\r|\n'  


def readlines(fio, size):
    buf = bytearray(4096)
    while True:
        if fio.tell() >= size:
            break               
        fio.readinto(buf)            
        for line in re.split(EOL_REGEX, buf):
            yield line

size = os.path.getsize("test.pdf")
with io.FileIO("test.pdf") as fio:
    for line in readlines(fio, size):
         ...

answered Oct 07 '22 20:10

Dev Aggarwal

Related questions
                            
                                How to log python program activity in Mac OS X
                            
                                2d convolution using python and numpy
                            
                                Why doesn't Python's `re.split()` split on zero-length matches?
                            
                                mysql LOAD DATA INFILE with auto-increment primary key
                            
                                Fetching just the Key/id from a ReferenceProperty in App Engine
                            
                                Is there a way to force lxml to parse Unicode strings that specify an encoding in a tag?
                            
                                Haystack in INSTALLED_APPS results in Error: cannot import name openProc
                            
                                ElementTree's iter() equivalent in Python2.6
                            
                                More elegant way to create a 2D matrix in Python [duplicate]
                            
                                Writing complex custom metadata on images through python
                            
                                Colormap for errorbars in x-y scatter plot using matplotlib
                            
                                Python pass tzinfo to naive datetime without pytz
                            
                                Does anyone have any examples of using OpenCV with python for descriptor extraction?
                            
                                subprocess.call env var
                            
                                How to get file name of logging.FileHandler in Python?
                            
                                class method __instancecheck__ does not work
                            
                                How can I pass kwargs in URL in django
                            
                                Trailing delimiter confuses pandas read_csv
                            
                                How to install Python for .NET on Windows
                            
                                Can I get a python object from its memory address?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reading a file with a specified delimiter for newline

Tags:

python

io

file-io

python-2.7

Abhishek Gupta

People also ask

2 Answers

NPE

Dev Aggarwal

Recent Activity

Donate For Us