Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this SQL (join) query slow when joining on a column that uses select max?

Tags:

sql

derby

I'm using Apache Derby 10.8 if it makes a difference.

I have a very simple database with a table full of items and a table full of bids on those items. I want to select every item with the highest bid for that item joined to it. The following is my first try at it and the performance is awful:

select
    item.id as item_id,
    item.name as item_name,
    item.retail_value as item_retail_value,
    item.vendor as item_vendor,
    bid.bid_amount as bid_amount,
    bid.bidder_name as bid_bidder_name,
    bid.bidder_phone as bid_bidder_phone,
    bid.operator_name as bid_operator_name
from item
    left outer join bid on bid.item_id = item.id and
    bid.bid_amount = (select max(bid.bid_amount) from bid where bid.item_id = item.id and bid.status = 'OK')

I created a set of test data that uses 282 items with 200 bids for each item (56400 bids total). The above query takes around 30-40 seconds to run. If I select every item and manually loop through the items selecting high bids for each, it takes less than a second.

I've tried indexing the bid.bid_amount and bid.status columns, but it didn't do anything noticeable. SQL isn't my strongest area, so if anyone is willing to explain why that query is so slow I'd really appreciate it.

like image 694
Ryan J Avatar asked Dec 21 '22 03:12

Ryan J


1 Answers

The query's slow because you're doing what's called a correlated subquery--it's running that max for each row.

Try something like this:

select
    item.id as item_id,
    item.name as item_name,
    item.retail_value as item_retail_value,
    item.vendor as item_vendor,
    bid.bid_amount as bid_amount,
    bid.bidder_name as bid_bidder_name,
    bid.bidder_phone as bid_bidder_phone,
    bid.operator_name as bid_operator_name
from 
    item
    left outer join (
        select 
            item_id, 
            MAX(bid_amount) maxamount 
        from 
            bid 
        where 
            status = 'OK' 
        group by 
            item_id
    ) b1 on
        item.id = b1.item_id
    left outer join bid on
        bid.item_id = item.id
        and bid.bid_amount = b1.maxamount

This subquery is only run once, and it will go much faster.

like image 188
Eric Avatar answered Dec 25 '22 22:12

Eric