I'm using Apache Derby 10.8 if it makes a difference.
I have a very simple database with a table full of items and a table full of bids on those items. I want to select every item with the highest bid for that item joined to it. The following is my first try at it and the performance is awful:
select
item.id as item_id,
item.name as item_name,
item.retail_value as item_retail_value,
item.vendor as item_vendor,
bid.bid_amount as bid_amount,
bid.bidder_name as bid_bidder_name,
bid.bidder_phone as bid_bidder_phone,
bid.operator_name as bid_operator_name
from item
left outer join bid on bid.item_id = item.id and
bid.bid_amount = (select max(bid.bid_amount) from bid where bid.item_id = item.id and bid.status = 'OK')
I created a set of test data that uses 282 items with 200 bids for each item (56400 bids total). The above query takes around 30-40 seconds to run. If I select every item and manually loop through the items selecting high bids for each, it takes less than a second.
I've tried indexing the bid.bid_amount
and bid.status
columns, but it didn't do anything noticeable. SQL isn't my strongest area, so if anyone is willing to explain why that query is so slow I'd really appreciate it.
The query's slow because you're doing what's called a correlated subquery--it's running that max
for each row.
Try something like this:
select
item.id as item_id,
item.name as item_name,
item.retail_value as item_retail_value,
item.vendor as item_vendor,
bid.bid_amount as bid_amount,
bid.bidder_name as bid_bidder_name,
bid.bidder_phone as bid_bidder_phone,
bid.operator_name as bid_operator_name
from
item
left outer join (
select
item_id,
MAX(bid_amount) maxamount
from
bid
where
status = 'OK'
group by
item_id
) b1 on
item.id = b1.item_id
left outer join bid on
bid.item_id = item.id
and bid.bid_amount = b1.maxamount
This subquery is only run once, and it will go much faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With