I am curious how Files work in python. How is file implemented so that it is able to be looped through like this:
csv_file = open("filename.csv", "r")
for line in csv_file:
# do something with line
If you're using Python 2, the details are a little murky; alexmcf's answer covers the basics, and you can look up further details from there.
If you're using Python 3, everything is documented in great detail in the io
module, and comes with a reasonably readable pure Python implementation in the stdlib, all built on top of nothing more than a very simple "raw file" interface (which FileIO
implements on top of POSIX native file descriptors on Unix).
The IOBase
ABC/mixin provides an __iter__
method based on the readline
method:
IOBase
(and its subclasses) supports the iterator protocol, meaning that anIOBase
object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding character strings). Seereadline()
below.
And if you look inside the 3.5 source, it's as simple as you'd expect it to be:
def __iter__(self):
self._checkClosed()
return self
def __next__(self):
line = self.readline()
if not line:
raise StopIteration
return line
Of course in CPython 3.1+, there's a C accelerator that's used in place of that Python code if possible, but it looks pretty similar:
static PyObject *
iobase_iter(PyObject *self)
{
if (_PyIOBase_check_closed(self, Py_True) == NULL)
return NULL;
Py_INCREF(self);
return self;
}
static PyObject *
iobase_iternext(PyObject *self)
{
PyObject *line = PyObject_CallMethodObjArgs(self, _PyIO_str_readline, NULL);
if (line == NULL)
return NULL;
if (PyObject_Size(line) == 0) {
Py_DECREF(line);
return NULL;
}
return line;
}
The file objects returned by open
, and automatically created for things like sys.stdout
, and most or all file objects created anywhere else in the stdlib (GzipFile
, etc.), are instances of TextIOWrapper
(for text files), or BufferedRandom
, BufferedReader
, or BufferedWriter
(for binary files), which all inherit this behavior from IOBase
. There's nothing stopping a different file class from overriding __iter__
(or registering with IOBase
as an ABC instead of inheriting it), but I don't know of any that do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With