Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sqlite3 Module in Python far Slower SELECT than in Shell

I am using the sqlite3 module in Python but am finding it incredibly slow for a certain SELECT query relative to running the query in sqlite3 in a command shell. I will start off by saying that both versions are the same 3.7.17.

My query is

SELECT r.ID, r.Date FROM my_table r
WHERE
r.Date IN (SELECT Date FROM my_table WHERE ID = r.ID GROUP BY Date LIMIT 2);

The Python code is

con = lite.connect(path_to_database)
cur = con.cursor()

with con:
    cur.execute(sql_query)

where sql_query is a string variable containing the initial query.

I'm assuming the problem is in optimising the IN subquery.

Performance details: my_table contains 167000 records, the query in the shell takes ~10 seconds, the query in Python takes > 5 minutes (I stopped it when it got this far).

Currently as it is table creation I am just copying and pasting code into the shell as a workaround, how can I fix this so that I can run the query from Python?

ADDITION

When I run EXPLAIN QUERY PLAN I get the following

Shell:

0           0           0           SCAN TABLE PIT_10_Days AS r (~500000 rows)
0           0           0           EXECUTE CORRELATED LIST SUBQUERY 1
1           0           0           SEARCH TABLE PIT_10_Days USING AUTOMATIC C
1           0           0           USE TEMP B-TREE FOR GROUP BY

Python:

0           0           TABLE PIT_10_Days AS r 
0           0           TABLE PIT_10_Days

I'm not sure if the difference is a problem with getting EXPLAIN QUERY PLAN in Python or if it is actually the problem itself.

like image 660
rwolst Avatar asked May 06 '14 11:05

rwolst


People also ask

Why do we use sqlite3 in Python?

SQLite is a C library that provides a lightweight disk-based database that doesn't require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. Some applications can use SQLite for internal data storage.

How to connect sqlite3 with Python?

Connect To Database#!/usr/bin/python import sqlite3 conn = sqlite3. connect('test. db') print "Opened database successfully"; Here, you can also supply database name as the special name :memory: to create a database in RAM.

Does Python have sqlite3?

Python has a library to access SQLite databases, called sqlite3, intended for working with this database which has been included with Python package since version 2.5.

What does sqlite3 connect return?

The sqlite3. connect() function returns a Connection object that we will use to interact with the SQLite database held in the file aquarium.


1 Answers

I'm sorry to be so late, but I have only now found this question.
Unfortunately I have no idea why the sqlite3 module behaves differently than the shell but you could try to avoid the correlated query from the first place. I'm not sure whether it always does what you want anyway because you do not order the results in your subquery.

I suppose you want the two latest dates for each ID? Try this:

SELECT r.ID AS ID, max( r.Date ) AS Date
 FROM my_table AS r
 GROUP BY r.ID

UNION

SELECT r.ID, max( r.Date )
  FROM 
      my_table AS r
      JOIN ( 
        SELECT ID,
               max( Date ) AS Date
          FROM my_table
         GROUP BY ID) AS maxDat
      ON
      r.ID = maxDat.ID AND
      r.Date != maxDat.Date
 GROUP BY r.ID;

It selects the IDs together with their latest Date. Then it unifies this result with a similar selection from a table where the actual latest date is taken out so that you will get the second latest date. If you need more than the latest two dates this will get pretty cumbersome, but for two dates only it should be okay and probably much faster.

like image 200
swenzel Avatar answered Oct 04 '22 20:10

swenzel