Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

csv.writer.writerows takes iterator?

Tags:

python

csv

The documentation for writerows states

Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.

It suggests that writerows takes a list as a parameter. But it can take an iterator, no problem

python -c 'import csv
> csv.writer(open("test.file.1", "w")).writerows(([x] for x in xrange(10)))
> '
cat test.file.1
0
1
2
3
4
5
6
7
8
9

What gives? Does it convert the iterator to a list before writing out to the file, or, is the documentation misleading and it can actually write iterators to files without materializing them? The underlying code is in C; I can't make sense of it.

like image 765
iruvar Avatar asked Sep 28 '22 17:09

iruvar


1 Answers

According to the sources for csv the DictWriter class does first create a list of rows to pass to the actual writer. See at line 155:

def writerows(self, rowdicts):
    rows = []
    for rowdict in rowdicts:
        rows.append(self._dict_to_list(rowdict))
    return self.writer.writerows(rows)

The funny thing is that the Writer class, that is implemented in the _csv module (the C extension) does not need a list. We can see from the sources that it just obtains an iterable from the argument and calls PyIter_Next:

csv_writerows(WriterObj *self, PyObject *seqseq)
{
    PyObject *row_iter, *row_obj, *result;

    row_iter = PyObject_GetIter(seqseq);
    // [...]
    while ((row_obj = PyIter_Next(row_iter))) {
        result = csv_writerow(self, row_obj);
        // [...]
}

Note that there is no call to PyList_* methods nor any check for the list type at all.

In any case both writerows method do accept any iterable, however DictWriter is going to create an (unneccessary) intermediate list. It is possible that in previous versions the Writer class did accept only lists and, as such, DictWriter had to do that conversion, however now it's outdated.

In current versions of python the DictWriter.writerows method could be reimplemented as:

def writerows(self, rowdicts):
    return self.writer.writerows(map(self._dict_to_list, rowdicts))
    # or:
    #return self.writer.writerows(self._dict_to_list(row) for row in rowdicts)

which ought to have the same behaviour, except for avoid an unneccessary creation of the list of rows.

like image 138
Bakuriu Avatar answered Oct 03 '22 06:10

Bakuriu