Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Very slow subqueries when using "NOT IN"

I'm working on generating reports for data contained within a large pre-existing Access database (~500 mb after compact & repair), and I'm having trouble with a slow subquery.

The database has a big table which contains a record of every customer purchase. Here's a simple query which finds customers who have bought a blue widget. It completes within a few seconds and returns about ten thousand records.

SELECT DISTINCT CustomerId 
FROM ProductSales
WHERE Product = 'BLUE' 

Here's a query which tries to find customers who have bought a blue widget, but not a red widget. It takes about an hour to run.

SELECT DISTINCT CustomerId FROM ProductSales
WHERE Product = 'BLUE' 
AND CustomerId NOT IN (
    SELECT CustomerId 
    FROM ProductSales 
    WHERE Product = 'RED'
)

Is there a way to refactor the second query to make it take a few minutes instead of an hour?

like image 767
James Avatar asked Aug 09 '11 16:08

James


1 Answers

Access' database engine can't use an index for Not In, so it's bound to be slow. With an index on CustomerId, this query should be much faster because the db engine can use the index.

SELECT DISTINCT blue.CustomerId
FROM
    ProductSales AS blue
    LEFT JOIN
        (
            SELECT CustomerId 
            FROM ProductSales 
            WHERE Product = 'RED'
        ) AS red
    ON blue.CustomerId = red.CustomerId
WHERE
        blue.Product = 'BLUE'
    AND red.CustomerId Is Null; 

You could probably also try a Not Exists approach, but index use there is not guaranteed. Also, please see the comment below from David Fenton which discusses performance impact in more detail.

like image 109
HansUp Avatar answered Nov 07 '22 15:11

HansUp