Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does readline() work behind the scenes when reading a text file?

Tags:

python

I would like to understand how readline() takes in a single line from a text file. The specific details I would like to know about, with respect to how the compiler interprets the Python language and how this is handled by the CPU, are:

  1. How does the readline() know which line of text to read, given that successive calls to readline() read the text line by line?
  2. Is there a way to start reading a line of text from the middle of a text? How would this work with respect to the CPU?

I am a "beginner" (I have about 4 years of "simpler" programming experience), so I wouldn't be able to understand technical details, but feel free to expand if it could help others understand!

like image 423
ragingasiancoder Avatar asked Mar 11 '23 16:03

ragingasiancoder


1 Answers

Example using the file file.txt:

fake file
with some text
in a few lines

Question 1: How does the readline() know which line of text to read, given that successive calls to readline() read the text line by line?

When you open a file in python, it creates a file object. File objects act as file descriptors, which means at any one point in time, they point to a specific place in the file. When you first open the file, that pointer is at the beginning of the file. When you call readline(), it moves the pointer forward to the character just after the next newline it reads.

Calling the tell() function of a file object returns the location the file descriptor is currently pointing to.

with open('file.txt', 'r') as fd:
    print fd.tell()
    fd.readline()
    print fd.tell()

# output:
0
10
# Or 11, depending on the line separators in the file


Question 2: Is there a way to start reading a line of text from the middle of a text? How would this work with respect to the CPU?

First off, reading a file doesn't really have anything to do with the CPU. It has to do with the operating system and the file system. Both of those determine how files can be read and written to. Barebones explanation of files

For random access in files, you can use the mmap module of python. The Python Module of the Week site has a great tutorial.

Example, jumping to the 2nd line in the example file and reading until the end:

import mmap
import contextlib

with open('file.txt', 'r') as fd:
    with contextlib.closing(mmap.mmap(fd.fileno(), 0, access=mmap.ACCESS_READ)) as mm:
        print mm[10:]

# output:
with some text
in a few lines
like image 101
xgord Avatar answered Mar 14 '23 07:03

xgord