Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to get max date before a given date

Suppose I have a table called Transaction and another table called Price. Price holds the prices for given funds at different dates. Each fund will have prices added at various dates, but they won't have prices at all possible dates. So for fund XYZ I may have prices for the 1 May, 7 May and 13 May and fund ABC may have prices at 3 May, 9 May and 11 May.

So now I'm looking at the price that was prevailing for a fund at the date of a transaction. The transaction was for fund XYZ on 10 May. What I want, is the latest known price on that day, which will be the price for 7 May.

Here's the code:

select d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice
from Transaction d
    inner join Price v
        on v.FundCode = d.FundCode
        and v.PriceDate = (
            select max(PriceDate)
            from Price
            where FundCode = v.FundCode
            /* */ and PriceDate < d.TransactionDate 
        )

It works, but it is very slow (several minutes in real world use). If I remove the line with the leading comment, the query is very quick (2 seconds or so) but it then uses the latest price per fund, which is wrong.

The bad part is that the price table is minuscule compared to some of the other tables we use, and it isn't clear to me why it is so slow. I suspect the offending line forces SQL Server to process a Cartesian product, but I don't know how to avoid it.

I keep hoping to find a more efficient way to do this, but it has so far escaped me. Any ideas?

like image 719
Cobus Kruger Avatar asked May 23 '11 14:05

Cobus Kruger


2 Answers

You don't specify the version of SQL Server you're using, but if you are using a version with support for ranking functions and CTE queries I think you'll find this quite a bit more performant than using a correlated subquery within your join statement.

It should be very similar in performance to Andriy's queries. Depending on the exact index topography of your tables, one approach might be slightly faster than another.

I tend to like CTE-based approaches because the resulting code is quite a bit more readable (in my opinion). Hope this helps!

;WITH set_gen (TransactionID, OfferPrice, Match_val)
AS
(
    SELECT d.TransactionID, v.OfferPrice, ROW_NUMBER() OVER(PARTITION BY d.TransactionID ORDER BY v.PriceDate ASC) AS Match_val
    FROM Transaction d
        INNER JOIN Price v
            ON v.FundCode = d.FundCode
    WHERE v.PriceDate <= d.TransactionDate
)
SELECT sg.TransactionID, d.FundCode, d.TransactionDate, sg.OfferPrice
FROM Transaction d
    INNER JOIN set_gen sg ON d.TransactionID = sg.TransactionID
WHERE sg.Match_val = 1
like image 193
mwigdahl Avatar answered Nov 11 '22 16:11

mwigdahl


There's a method for finding rows with maximum or minimum values, which involves LEFT JOIN to self, rather than more intuitive, but probably more costly as well, INNER JOIN to a self-derived aggregated list.

Basically, the method uses this pattern:

SELECT t.*
FROM t
  LEFT JOIN t AS t2 ON t.key = t2.key
    AND t2.Value > t.Value  /* ">" is when getting maximums; "<" is for minimums */
WHERE t2.key IS NULL

or its NOT EXISTS counterpart:

SELECT *
FROM t
WHERE NOT EXISTS (
  SELECT *
  FROM t AS t2
  WHERE t.key = t2.key
    AND t2.Value > t.Value  /* same as above applies to ">" here as well */
)

So, the result is all the rows for which there doesn't exist a row with the same key and the value greater than the given.

When there's just one table, application of the above method is pretty straightforward. However, it may not be that obvious how to apply it when there's another table, especially when, like in your case, the other table makes the actual query more complex not merely by its being there, but also by providing us with an additional filtering for the values we are looking for, namely with the upper limits for the dates.

So, here's what the resulting query might look like when applying the LEFT JOIN version of the method:

SELECT
  d.TransactionID,
  d.FundCode,
  d.TransactionDate,
  v.OfferPrice
FROM Transaction d
  INNER JOIN Price v ON v.FundCode = d.FundCode
  LEFT JOIN Price v2 ON v2.FundCode = v.FundCode  /* this and */
    AND v2.PriceDate > v.PriceDate                /* this are where we are applying
                                                       the above method; */
    AND v2.PriceDate < d.TransactionDate          /* and this is where we are limiting
                                                       the maximum value */
WHERE v2.FundCode IS NULL

And here's a similar solution with NOT EXISTS:

SELECT
  d.TransactionID,
  d.FundCode,
  d.TransactionDate,
  v.OfferPrice
FROM Transaction d
  INNER JOIN Price v ON v.FundCode = d.FundCode
  WHERE NOT EXISTS (
    SELECT *
    FROM Price v2
    WHERE v2.FundCode = v.FundCode           /* this and */
      AND v2.PriceDate > v.PriceDate         /* this are where we are applying
                                                the above method; */
      AND v2.PriceDate < d.TransactionDate   /* and this is where we are limiting
                                                the maximum value */
  )
like image 44
Andriy M Avatar answered Nov 11 '22 16:11

Andriy M