Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to read a file in a loop in python using a separator other than newline

Tags:

python

I usually read files like this in Python:

f = open('filename.txt', 'r')
for x in f:
    doStuff(x)
f.close()

However, this splits the file by newlines. I now have a file which has all of its info in one line (45,000 strings separated by commas). While a file of this size is trivial to read in using something like

f = open('filename.txt', 'r')
doStuff(f.read())
f.close()

I am curious if for a much larger file which is all in one line it would be possible to achieve a similar iteration effect as in the first code snippet but with splitting by comma instead of newline, or by any other character?

like image 501
vasek1 Avatar asked Apr 17 '12 01:04

vasek1


People also ask

How do you loop read a file in Python?

It is possible to read a file line by line using for loop. To do that, first, open the file using Python open() function in read mode. The open() function will return a file handler. Use the file handler inside your for-loop and read all the lines from the given file line by line.

Does readline read newline?

The readline method reads one line from the file and returns it as a string. The string returned by readline will contain the newline character at the end. This method returns the empty string when it reaches the end of the file.

How do I read a text file line by line in Python?

Method 1: Read a File Line by Line using readlines() readlines() is used to read all the lines at a single go and then return them as each line a string element in a list. This function can be used for small files, as it reads the whole file content to the memory, then split it into separate lines.


1 Answers

The following function is a fairly straightforward way to do what you want:

def file_split(f, delim=',', bufsize=1024):
    prev = ''
    while True:
        s = f.read(bufsize)
        if not s:
            break
        split = s.split(delim)
        if len(split) > 1:
            yield prev + split[0]
            prev = split[-1]
            for x in split[1:-1]:
                yield x
        else:
            prev += s
    if prev:
        yield prev

You would use it like this:

for item in file_split(open('filename.txt')):
    doStuff(item)

This should be faster than the solution that EMS linked, and will save a lot of memory over reading the entire file at once for large files.

like image 130
Andrew Clark Avatar answered Nov 01 '22 11:11

Andrew Clark