I am using the sqlite3 module in Python but am finding it incredibly slow for a certain SELECT query relative to running the query in sqlite3 in a command shell. I will start off by saying that both versions are the same 3.7.17.
My query is
SELECT r.ID, r.Date FROM my_table r
WHERE
r.Date IN (SELECT Date FROM my_table WHERE ID = r.ID GROUP BY Date LIMIT 2);
The Python code is
con = lite.connect(path_to_database)
cur = con.cursor()
with con:
cur.execute(sql_query)
where sql_query
is a string variable containing the initial query.
I'm assuming the problem is in optimising the IN
subquery.
Performance details: my_table
contains 167000 records, the query in the shell takes ~10 seconds, the query in Python takes > 5 minutes (I stopped it when it got this far).
Currently as it is table creation I am just copying and pasting code into the shell as a workaround, how can I fix this so that I can run the query from Python?
ADDITION
When I run EXPLAIN QUERY PLAN
I get the following
Shell:
0 0 0 SCAN TABLE PIT_10_Days AS r (~500000 rows)
0 0 0 EXECUTE CORRELATED LIST SUBQUERY 1
1 0 0 SEARCH TABLE PIT_10_Days USING AUTOMATIC C
1 0 0 USE TEMP B-TREE FOR GROUP BY
Python:
0 0 TABLE PIT_10_Days AS r
0 0 TABLE PIT_10_Days
I'm not sure if the difference is a problem with getting EXPLAIN QUERY PLAN
in Python or if it is actually the problem itself.
SQLite is a C library that provides a lightweight disk-based database that doesn't require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. Some applications can use SQLite for internal data storage.
Connect To Database#!/usr/bin/python import sqlite3 conn = sqlite3. connect('test. db') print "Opened database successfully"; Here, you can also supply database name as the special name :memory: to create a database in RAM.
Python has a library to access SQLite databases, called sqlite3, intended for working with this database which has been included with Python package since version 2.5.
The sqlite3. connect() function returns a Connection object that we will use to interact with the SQLite database held in the file aquarium.
I'm sorry to be so late, but I have only now found this question.
Unfortunately I have no idea why the sqlite3 module behaves differently than the shell but
you could try to avoid the correlated query from the first place. I'm not sure whether it always does what you want anyway because you do not order the results in your subquery.
I suppose you want the two latest dates for each ID? Try this:
SELECT r.ID AS ID, max( r.Date ) AS Date
FROM my_table AS r
GROUP BY r.ID
UNION
SELECT r.ID, max( r.Date )
FROM
my_table AS r
JOIN (
SELECT ID,
max( Date ) AS Date
FROM my_table
GROUP BY ID) AS maxDat
ON
r.ID = maxDat.ID AND
r.Date != maxDat.Date
GROUP BY r.ID;
It selects the IDs together with their latest Date. Then it unifies this result with a similar selection from a table where the actual latest date is taken out so that you will get the second latest date. If you need more than the latest two dates this will get pretty cumbersome, but for two dates only it should be okay and probably much faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With