Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Read whitespace separated strings from file similar to readline

Tags:

python

file-io

In Python, f.readline() returns the next line from the file f. That is, it starts at the current position of f, reads till it encounters a line break, returns everything in between and updates the position of f.

Now I want to do the exactly the same, but for whitespace separated files (not only newlines). For example, consider a file f with the content

token1 token2

token3                            token4


         token5

So I'm looking for some function readtoken() such that after opening f, the first call of f.readtoken() returns token1, the second call retuns token2 etc.

For efficiency and to avoid problems with very long lines or very large files, there should be no buffering.

I was almost sure that this should be possible "out of the box" with the standard library. However, I didn't find any suitable function or a way to redefine the delimiters for readline().

like image 902
azimut Avatar asked Feb 16 '23 14:02

azimut


1 Answers

You'd need to create a wrapper function; this is easy enough:

def read_by_tokens(fileobj):
    for line in fileobj:
        for token in line.split():
            yield token

Note that .readline() doesn't just read a file character by character until a newline is encountered; the file is read in blocks (a buffer) to improve performance.

The above method reads the file by lines but yields the result split on whitespace. Use it like:

with open('somefilename') as f:
    for token in read_by_tokens(f):
        print(token)

Because read_by_tokens() is a generator, you either need to loop directly over the function result, or use the next() function to get tokens one by one:

with open('somefilename') as f:
    tokenized = read_by_tokens(f)

    # read first two tokens separately
    first_token = next(tokenized)
    second_token = next(tokenized)

    for token in tokenized:
        # loops over all tokens *except the first two*
        print(token)
like image 83
Martijn Pieters Avatar answered Feb 19 '23 11:02

Martijn Pieters