Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing an embedded SELECT query in mySQL

Ok, here's a query that I am running right now on a table that has 45,000 records and is 65MB in size... and is just about to get bigger and bigger (so I gotta think of the future performance as well here):

SELECT count(payment_id) as signup_count, sum(amount) as signup_amount
FROM payments p
WHERE tm_completed BETWEEN '2009-05-01' AND '2009-05-30'
AND completed > 0
AND tm_completed IS NOT NULL
AND member_id NOT IN (SELECT p2.member_id FROM payments p2 WHERE p2.completed=1 AND p2.tm_completed < '2009-05-01' AND p2.tm_completed IS NOT NULL GROUP BY p2.member_id)

And as you might or might not imagine - it chokes the mysql server to a standstill...

What it does is - it simply pulls the number of new users who signed up, have at least one "completed" payment, tm_completed is not empty (as it is only populated for completed payments), and (the embedded Select) that member has never had a "completed" payment before - meaning he's a new member (just because the system does rebills and whatnot, and this is the only way to sort of differentiate between an existing member who just got rebilled and a new member who got billed for the first time).

Now, is there any possible way to optimize this query to use less resources or something, and to stop taking my mysql resources down on their knees...?

Am I missing any info to clarify this any further? Let me know...

EDIT:

Here are the indexes already on that table:

PRIMARY PRIMARY 46757 payment_id

member_id INDEX 23378 member_id

payer_id INDEX 11689 payer_id

coupon_id INDEX 1 coupon_id

tm_added INDEX 46757 tm_added, product_id

tm_completed INDEX 46757 tm_completed, product_id

like image 248
Crazy Serb Avatar asked May 30 '09 05:05

Crazy Serb


3 Answers

Those kinds of IN subqueries are a bit slow in MySQL. I would rephrase it like this:

SELECT COUNT(1) AS signup_count, SUM(amount) AS signup_amount
FROM   payments p
WHERE  tm_completed BETWEEN '2009-05-01' AND '2009-05-30'
AND    completed > 0
AND    NOT EXISTS (
           SELECT member_id
           FROM   payments
           WHERE  member_id = p.member_id
           AND    completed = 1
           AND    tm_completed < '2009-05-01');

The check 'tm_completed IS NOT NULL' is not necessary as that is implied by your BETWEEN condition.

Also make sure you have an index on:

(tm_completed, completed)
like image 107
cletus Avatar answered Sep 19 '22 12:09

cletus


I had fun putting together this solution which does not require a subquery:

SELECT count(p1.payment_id) as signup_count, 
       sum(p1.amount)       as signup_amount  

  FROM payments p1
       LEFT JOIN payments p2 
       ON p1.member_id = p2.member_id
   AND p2.completed = 1
   AND p2.tm_completed < date '2009-05-01'

 WHERE p1.completed > 0
   AND p1.tm_completed between date '2009-05-01' and date '2009-05-30'
   AND p2.member_id IS NULL;
like image 37
mechanical_meat Avatar answered Sep 17 '22 12:09

mechanical_meat


Avoid using IN with a subquery; MySQL does not optimize these well (though there are pending optimizations in 5.4 and 6.0 regarding this (see here). Rewriting this as a join will probably get you a performance boost:

SELECT count(payment_id) as signup_count, sum(amount) as signup_amount
FROM payments p
LEFT JOIN (SELECT p2.member_id
          FROM payments p2
          WHERE p2.completed=1
          AND p2.tm_completed < '2009-05-01'
          AND p2.tm_completed IS NOT NULL
          GROUP BY p2.member_id) foo
ON p.member_id = foo.member_id AND foo.member_id IS NULL
WHERE tm_completed BETWEEN '2009-05-01' AND '2009-05-30'
AND completed > 0
AND tm_completed IS NOT NULL

Second, I would have to see your table schema; are you using indexes?

like image 24
Todd Gardner Avatar answered Sep 20 '22 12:09

Todd Gardner