Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python slow on fetchone, hangs on fetchall

Tags:

python

sqlite

I'm writing a script to SELECT query a database and parse through ~33,000 records. Unfortunately I'm running into problems at the cursor.fetchone()/cursor.fetchall() phase of things.

I first tried iterating through the cursor a record at a time like so:

# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
while True:
    # Get the next row in the cursor
    row = cursor.fetchone()
    if row == None:
        break

    # TODO: Determine if there's any kanji in row[2]

    weight = float((row[3] + row[4]))/2
    printStatus("Weight: " + str(weight))

Based on the output of printStatus (it prints out a timestamp plus whatever string is passed to it), the script took approximately 1 second to process each row. This lead me to believe that the query was being re-run each time the loop iterated (with a LIMIT 1 or something), as it took ~1 second for the same query to run once in something like SQLiteStudio [i]and[/i] return all 33,000 rows. I calculated that, at that rate, it would take around 7 hours to get through all 33,000 records.

Instead of sitting through that, I tried to use cursor.fetchall() instead:

results = cursor.fetchall()

# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
for row in results:
    # TODO: Determine if there's any kanji in row[2]

    weight = float((row[3] + row[4]))/2
    printStatus("Weight: " + str(weight))

Unfortunately, the Python executable locked up at 25% CPU and ~6MB of RAM when it got to the cursor.fetchall() line. I left the script running for ~10 minutes, but nothing happened.

Is ~33,000 returned rows (about 5MB of data) too much for Python to grab at once? Am I stuck iterating through one at a time? Or is there something I can do to speed things up?

EDIT: Here's some console output

12:56:26.019: Adding new column 'weight' and related index to r_ele
12:56:26.019: Querying database
12:56:28.079: Starting weight calculations
12:56:28.079: Weight: 1.0
12:56:28.079: Weight: 0.5
12:56:28.080: Weight: 0.5
12:56:28.338: Weight: 1.0
12:56:28.339: Weight: 3.0
12:56:28.843: Weight: 1.5
12:56:28.844: Weight: 1.0
12:56:28.844: Weight: 0.5
12:56:28.844: Weight: 0.5
12:56:28.845: Weight: 0.5
12:56:29.351: Weight: 0.5
12:56:29.855: Weight: 0.5
12:56:29.856: Weight: 1.0
12:56:30.371: Weight: 0.5
12:56:30.885: Weight: 0.5
12:56:31.146: Weight: 0.5
12:56:31.650: Weight: 1.0
12:56:32.432: Weight: 0.5
12:56:32.951: Weight: 0.5
12:56:32.951: Weight: 0.5
12:56:32.952: Weight: 1.0
12:56:33.454: Weight: 0.5
12:56:33.455: Weight: 0.5
12:56:33.455: Weight: 1.0
12:56:33.716: Weight: 0.5
12:56:33.716: Weight: 1.0

And here's the SQL query:

//...snip (it wasn't the culprit)...

The output of EXPLAIN QUERY PLAN from SQLiteStudio:

0   0   0   SCAN TABLE r_ele AS re USING COVERING INDEX r_ele_fk (~500000 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 1
1   0   0   SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 2
2   0   0   SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows)
2   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 3
3   0   0   SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows)
3   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 4
4   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 5
5   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 6
6   0   0   SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 7
7   0   0   SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows)
7   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 8
8   0   0   SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows)
8   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 9
9   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
like image 958
IAmKale Avatar asked Aug 20 '13 19:08

IAmKale


1 Answers

SQLite computes result records on the fly. fetchone is slow because it has to execute all subqueries for each record in r_ele. fetchall is even slower because it takes just as long as if you had executed fetchone for all records.

SQLite 3.7.13 estimates that all the lookups on the value column would be horribly slow, and therefore creates a temporary index for this query. You should create a permanent index so that it can be used by SQLite 3.6.21:

CREATE INDEX idx_k_ele_value ON k_ele(value);

If that does not help, update to a Python with a newer SQLite version, or use another database library with a newer SQLite version built-in, such as APSW.

like image 116
CL. Avatar answered Oct 29 '22 12:10

CL.