Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Join on nearest available date

I currently have these tables:

CREATE TABLE #SECURITY_TEMP (ID CHAR(30))
CREATE TABLE #SECURITY_TEMP_PRICE_HISTORY (ID CHAR(30), PRICEDATE DATE, PRICE FLOAT)
CREATE TABLE #SECURITY_POST (ID CHAR(30), SECPOS int)

INSERT INTO #SECURITY_TEMP (ID) VALUES ('APPL') ,('VOD'),('VOW3'), ('AAA')
INSERT INTO #SECURITY_TEMP_PRICE_HISTORY (ID,PRICEDATE, PRICE) VALUES 
('APPL', '20150101',10.4), ('APPL', '20150116',15.4), ('APPL', '20150124',22.4), 
('VOD', '20150101', 30.5), ('VOD', '20150116',16.5), ('VOD', '20150124',16.5), 
('VOW3', '20150101', 45.5), ('VOW3', '20150116',48.8) ,('VOW3', '20150124',50.55), 
('AAA', '20100118', 0.002)

INSERT INTO #SECURITY_POST (ID,SECPOS) VALUES ('APPL', 100), ('VOD', 350), ('VOW3', 400)

I want to have a clean table that shows me the security ID, the security position and the latest available price for that security when a date is passed.

Now when I do the following:

SELECT sec.ID, sec.SECPOS, t.PRICE
FROM #SECURITY_POST as SEC INNER JOIN #SECURITY_TEMP_PRICE_HISTORY as t
ON sec.ID = t.ID
WHERE t.PriceDate = '20150101'
GROUP BY sec.ID, secPos, t.price

I get the correct result

 1. ID  SECPOS  PRICE 
 2. APPL  100   10.4
 3. VOD   350   30.5
 4. VOW3  400   45.5

However, there may be individual circumstances where, the price of a stock is not available. In that sense, I therefore want to be able to get the most recent price available.

Doing

SELECT sec.ID, sec.SECPOS, t.PRICE  
FROM #SECURITY_POST as SEC INNER JOIN
     #SECURITY_TEMP_PRICE_HISTORY as t
     ON sec.ID = t.ID
WHERE t.PriceDate = '20150117'
GROUP BY sec.ID, secPos, t.price

Returns 0 rows because of no data, and doing

SELECT sec.ID, sec.SECPOS, t.PRICE  
FROM #SECURITY_POST as SEC INNER JOIN
     #SECURITY_TEMP_PRICE_HISTORY as t
     ON sec.ID = t.ID
WHERE t.PriceDate <= '20150117'
GROUP BY sec.ID, sec.secPos, t.price
HAVING sec.secpos <> 0

Returns duplicate rows.

I have tried loads of different methodologies and I just cannot get the output I want. Furthermore, I would also like to be able to get one column with the price nearest a date (call it START_DATE) and one column with the price nearest a second date (call it END_DATE) and one column that is going to be the position Price@END_DATE - Price@START_DATE. The price is always taken from the same #SECURITY_TEMP_PRICE_HISTORY.

However, my SQL knowledge is just embarrassing, and I could not figure out a good efficient way of doing this. Any help would be appreciated. Please also note that the #SECURITY_PRICE_HISTORY table may contain more securities than the #SECURITY_POST Table.

like image 557
buellboy Avatar asked Jul 24 '15 15:07

buellboy


People also ask

What does (+) mean in SQL join?

The plus sign is Oracle syntax for an outer join. There isn't a minus operator for joins. An outer join means return all rows from one table. Also return the rows from the outer joined where there's a match on the join key. If there's no matching row, return null.

How do I find the nearest value in SQL?

SQL uses the CEILING function to perform this computation. It takes a single argument: the column whose values you'd like to round up to the nearest integer.

What is join elimination?

The join elimination rewrite optimization reduces the join degree of the query by eliminating tables from the query when it is safe to do so. Typically, this optimization is used when the query contains a primary key-foreign key join, and only primary key columns from the primary table are referenced in the query.


1 Answers

This should do the trick. OUTER APPLY is a join operator that (like CROSS APPLY) allows a derived table to have an outer reference.

SELECT
   s.ID,
   s.SecPos,
   t.Price
   t.PriceDate
FROM
   #SECURITY_POST s
   OUTER APPLY (
      SELECT TOP 1 *
      FROM #SECURITY_TEMP_PRICE_HISTORY t
      WHERE
         s.ID = t.ID
         AND t.PriceDate <= '20150117'
      ORDER BY t.PriceDate DESC
   ) t
;

You may also want to consider flagging security prices that are very old, or limiting the lookup for the most recent security to a certain period (a week or a month or something).

Make sure that your price history table has an index with (ID, PriceDate) so that the subquery lookups can use range seeks and your performance can be good. Make sure you do any date math on the security table, not the history table, or you will force the price-lookup subquery to be non-sargable, which would be bad for performance as the range seeks would not be possible.

If no price is found for the security, OUTER APPLY will still allow the row to exist, so the price will show as NULL. If you want securities to not be shown when no appropriate price is found, use CROSS APPLY.

For your second part of the question, you can do this with two OUTER APPLY operations, like so:

DECLARE
   @StartDate date = '20150101',
   @EndDate date = '20150118';

SELECT
   S.ID,
   S.SecPos,
   StartDate = B.PriceDate,
   StartPrice = B.Price,
   EndDate = E.PriceDate,
   EndPrice = E.Price,
   Position = B.Price - E.Price
FROM
   #SECURITY_POST S
   OUTER APPLY (
      SELECT TOP 1 *
      FROM #SECURITY_TEMP_PRICE_HISTORY B
      WHERE
         S.ID = B.ID
         AND B.PriceDate <= @StartDate
      ORDER BY B.PriceDate DESC
   ) B
   OUTER APPLY (
      SELECT TOP 1 *
      FROM #SECURITY_TEMP_PRICE_HISTORY E
      WHERE
         S.ID = E.ID
         AND E.PriceDate <= @EndDate
      ORDER BY E.PriceDate DESC
   ) E
;

With your data this yields the following result set:

ID    SecPos  StartDate   StartPrice  EndDate     EndPrice  Position
----  ------  ----------  ----------  ----------  --------  --------
APPL  100     2015-01-01  10.4        2015-01-16  15.4       -5
VOD   350     2015-01-01  30.5        2015-01-16  16.5       14
VOW3  400     2015-01-01  45.5        2015-01-16  48.8       -3.3

Last, while not all agree, I would encourage you to name your ID columns with the table name as in SecurityID instead of ID. In my experience the use of ID only leads to problems.

Note: there is a way to solve this problem using the Row_Number() windowing function. If you have relatively few price points compared to the number of stocks, and you're looking up prices for most of the stocks in the history table, then you might get better performance with that method. However, if there are a great number of price points per stock, or you're filtering to just a few stocks, you may get better performance with the method I've shown you.

like image 154
ErikE Avatar answered Sep 24 '22 03:09

ErikE