I have a file in which lines are separated using a delimeter say .
. I want to read this file line by line, where lines should be based on presence of .
instead of newline.
One way is:
f = open('file','r')
for line in f.read().strip().split('.'):
#....do some work
f.close()
But this is not memory efficient if my file is too large. Instead of reading a whole file together I want to read it line by line.
open
supports a parameter 'newline' but this parameter only takes None, '', '\n', '\r', and '\r\n'
as input as mentioned here.
Is there any way to read files line efficiently but based on a pre-specified delimiter?
The readline method reads one line from the file and returns it as a string. The string returned by readline will contain the newline character at the end.
You could actually put the newlines to good use by reading the entire file into memory as a single long string and then use them to split that into the list of grades by using the string splitlines() method which, by default, removes them in the process. with open("grades. dat") as file: grades = [line.
The readLine() method of BufferedReader class reads file line by line, and each line appended to StringBuffer, followed by a linefeed.
You could use a generator:
def myreadlines(f, newline):
buf = ""
while True:
while newline in buf:
pos = buf.index(newline)
yield buf[:pos]
buf = buf[pos + len(newline):]
chunk = f.read(4096)
if not chunk:
yield buf
break
buf += chunk
with open('file') as f:
for line in myreadlines(f, "."):
print line
Here is a more efficient answer, using FileIO
and bytearray
that I used for parsing a PDF file -
import io
import re
# the end-of-line chars, separated by a `|` (logical OR)
EOL_REGEX = b'\r\n|\r|\n'
# the end-of-file char
EOF = b'%%EOF'
def readlines(fio):
buf = bytearray(4096)
while True:
fio.readinto(buf)
try:
yield buf[: buf.index(EOF)]
except ValueError:
pass
else:
break
for line in re.split(EOL_REGEX, buf):
yield line
with io.FileIO("test.pdf") as fio:
for line in readlines(fio):
...
The above example also handles a custom EOF. If you don't want that, use this:
import io
import os
import re
# the end-of-line chars, separated by a `|` (logical OR)
EOL_REGEX = b'\r\n|\r|\n'
def readlines(fio, size):
buf = bytearray(4096)
while True:
if fio.tell() >= size:
break
fio.readinto(buf)
for line in re.split(EOL_REGEX, buf):
yield line
size = os.path.getsize("test.pdf")
with io.FileIO("test.pdf") as fio:
for line in readlines(fio, size):
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With