Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper way to use PyMySql in a memory efficient generator

I want to write a generator function that will run on a memory limited system that uses PyMySql (or MySQLDb) to return the results of a select query one at a time. The following works:

#execute a select query and return results as a generator
def SQLSelectGenerator(self,stmt):
    #error handling code removed
    cur.execute(stmt)

    row = ""
    while row is not None:
        row = self.cur.fetchone()
        yield row

However, the following also seems to work, but it is cryptic as to whether it is performing a fetchall(). I can't find in the Python DB API what exactly happens when you iterate a cursor object as a list:

#execute a select query and return results as a generator
def SQLSelectGenerator(self,stmt):
    #error handling code removed
    cur.execute(stmt)

 for row in self.cur:
    yield row

In both cases, the following prints all the rows sucessfully

stmt = "select * from ..."
for l in SQLSelectGenerator(stmt):
    print(l)

So I would like to know whether the second implementation is better or worse, and whether it is calling fetchall or doing something tricky with fetchone. Fetchall will blow up the system this will run on because there are millions of rows.

like image 379
Tommy Avatar asked Oct 21 '22 05:10

Tommy


1 Answers

According to the PyMySql source, doing

for row in self.cur:
   yield row

Meaning you're internally executing fetchone() repeatedly, just like your first example:

class Cursor(object):
    '''
    This is the object you use to interact with the database.
    '''
    ...
    def __iter__(self):
        return iter(self.fetchone, None)

So I would expect the two approaches to be essentially equal in terms of memory usage and performance. You may as well use the second one since its cleaner and simpler.

like image 114
dano Avatar answered Oct 24 '22 10:10

dano