I've been using SQL for years now but rarely anything more that simple inserts and selects etc... so I'm no SQL expert. I'm wondering if I could get some help in optimizing a more complex SQL statement that I'm executing on SQLite, from PHP through PDO.
The statement seems to work correctly, just seems to take longer that I would have expected (or perhaps I'm just expecting too much).
This is the SQL:
INSERT OR IGNORE INTO MailQueue(SubscriberID, TemplateID)
SELECT Subscribers.ID, '1' AS TemplateID
FROM Subscribers
INNER JOIN SubscriberGroups ON Subscribers.ID=SubscriberGroups.SubscriberID
WHERE SubscriberGroups.GroupID IN ('1', '2', '3')
AND Subscribers.ID NOT IN
(
SELECT Subscribers.ID FROM Subscribers
INNER JOIN SubscriberGroups ON Subscribers.ID=SubscriberGroups.SubscriberID
WHERE SubscriberGroups.GroupID IN ('4', '5', '6')
);
What I've got is a list of subscribers, in one or more groups. I want to add subscribers to a mail queue, selecting those that belong to one or more groups (1,2,3) but exclude those that are also in another set of groups (4,5,6).
Firstly, is the above SQL typical of how to do this?
Secondly, what indicies should I have to make this work as efficiently as possible?
Currently it takes about 30 seconds to get through about 5000 subscriber records (and a handful of groups) on a avg spec LAMP.
At the end of the day performance isn't that critical, but I'd like to understand this stuff better so any insight greatly appreciated.
Brad
The query optimizer in SQLite has basically two choices on how to implement this query. (There are actually six different choices, but we will only consider two of them here.) Pseudocode below demonstrating these two choices. The same indexes are used to speed up every loop in both implementation options.
Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive.
With Actian Zen, developers and product managers get all the advantages of SQLite but in a powerful, secure, and scalable engine that can run serverless or as a client-server. Actian Zen is orders of magnitude faster than SQLite.
Chances are the extra joins are killing you. What if you do:
SELECT Subscribers.ID, '1' AS TemplateID
FROM Subscribers
WHERE EXISTS( SELECT *
FROM SubscriberGroups
WHERE Subscribers.ID=SubscriberGroups.SubscriberID
AND SubscriberGroups.GroupID IN ('1', '2', '3') )
AND NOT EXISTS( SELECT *
FROM SubscriberGroups
WHERE Subscribers.ID=SubscriberGroups.SubscriberID
AND SubscriberGroups.GroupID IN ('4', '5', '6')
);
You'd also want to make sure that you have an index on SubscriberGroups( SubscriberID, GroupID )
My guess is that Subscribers already has an index on ID, right?
EDIT: Another option, which may or may not be faster. Look at the query plans of each to see...
This one may be a single index scan which could be faster than two index seeks, but it depends on SQLite's optimizer...
SELECT Subscribers.ID, '1' AS TemplateID
FROM Subscribers
INNER JOIN( SELECT SUM( CASE WHEN GroupID IN('1', '2', '3') THEN 1 ELSE 0 END ) AS inGroup,
SUM( CASE WHEN GroupID IN('4', '5', '6') THEN 1 ELSE 0 END ) AS outGroup,
SubscriberID
FROM SubscriberGroups
WHERE SubscriberGroups.GroupID IN ('1', '2', '3', '4', '5', '6' )
) SubscriberGroups
ON Subscribers.ID=SubscriberGroups.SubscriberID
AND inGroup > 0
AND outGroup = 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With