Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python script hangs when calling cursor.fetchall() with large data set

I have a query that returns over 125K rows.

The goal is to write a script the iterates through the rows, and for each, populate a second table with data processed from the result of the query.

To develop the script, I created a duplicate database with a small subset of the data (4126 rows)

On the small database, the following code works:

import os
import sys
import random

import mysql.connector

cnx = mysql.connector.connect(user='dbuser', password='thePassword',
                          host='127.0.0.1',
                          database='db')
cnx_out = mysql.connector.connect(user='dbuser', password='thePassword',
                          host='127.0.0.1',
                          database='db')

ins_curs = cnx_out.cursor()

curs = cnx.cursor(dictionary=True)
#curs = cnx.cursor(dictionary=True,buffered=True) #fail

with open('sql\\getRawData.sql') as fh:
    sql = fh.read()

curs.execute(sql, params=None, multi=False)
result = curs.fetchall()  #<=== script stops at this point
print len(result) #<=== this line never executes

print curs.column_names

curs.close()
cnx.close()
cnx_out.close()
sys.exit()

The line curs.execute(sql, params=None, multi=False) succeeds on both the large and small databases. If I use curs.fetchone() in a loop, I can read all records.

If I alter the line:

curs = cnx.cursor(dictionary=True)

to read:

curs = cnx.cursor(dictionary=True,buffered=True)

The script hangs at curs.execute(sql, params=None, multi=False).

I can find no documentation on any limits to fetchall(), nor can I find any way to increase the buffer size, and no way to tell how large a buffer I even need.

There are no exceptions raised.

How can I resolve this?

like image 570
Vorpal Swordsman Avatar asked Apr 10 '15 01:04

Vorpal Swordsman


People also ask

What does cursor Fetchall () method do?

fetchall() Method. The method fetches all (or all remaining) rows of a query result set and returns a list of tuples. If no more rows are available, it returns an empty list. You must fetch all rows for the current query before executing new statements using the same connection.

What is Fetchone and Fetchall in Python?

First understand what is the use of fetchall, fetchmany(), fetchone(). cursor. fetchall() fetches all the rows of a query result. It returns all the rows as a list of tuples. An empty list is returned if there is no record to fetch.

Does Fetchall return tuple?

fetchall() returns a tuple on empty result.


1 Answers

I was having this same issue, first on a query that returned ~70k rows and then on one that only returned around 2k rows (and for me RAM was also not the limiting factor). I switched from using mysql.connector (i.e. the mysql-connector-python package) to MySQLdb (i.e. the mysql-python package) and then was able to fetchall() on large queries with no problem. Both packages seem to follow the python DB API, so for me MySQLdb was a drop-in replacement for mysql.connector, with no code changes necessary beyond the line that sets up the connection. YMMV if you're leveraging something specific about mysql.connector.

Pragmatically speaking, if you don't have a specific reason to be using mysql.connector the solution to this is just to switch to a package that works better!

like image 100
mactyr Avatar answered Sep 30 '22 03:09

mactyr