Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL join against date ranges?

Consider two tables:

Transactions, with amounts in a foreign currency:

     Date  Amount ========= =======  1/2/2009    1500  2/4/2009    2300 3/15/2009     300 4/17/2009    2200 etc. 

ExchangeRates, with the value of the primary currency (let's say dollars) in the foreign currency:

     Date    Rate ========= =======  2/1/2009    40.1  3/1/2009    41.0  4/1/2009    38.5  5/1/2009    42.7 etc. 

Exchange rates can be entered for arbitrary dates - the user could enter them on a daily basis, weekly basis, monthly basis, or at irregular intervals.

In order to translate the foreign amounts to dollars, I need to respect these rules:

A. If possible, use the most recent previous rate; so the transaction on 2/4/2009 uses the rate for 2/1/2009, and the transaction on 3/15/2009 uses the rate for 3/1/2009.

B. If there isn't a rate defined for a previous date, use the earliest rate available. So the transaction on 1/2/2009 uses the rate for 2/1/2009, since there isn't an earlier rate defined.

This works...

Select      t.Date,      t.Amount,     ConvertedAmount=(            Select Top 1              t.Amount/ex.Rate         From ExchangeRates ex         Where t.Date > ex.Date         Order by ex.Date desc     ) From Transactions t 

... but (1) it seems like a join would be more efficient & elegant, and (2) it doesn't deal with Rule B above.

Is there an alternative to using the subquery to find the appropriate rate? And is there an elegant way to handle Rule B, without tying myself in knots?

like image 745
Herb Caudill Avatar asked Feb 21 '10 16:02

Herb Caudill


2 Answers

You could first do a self-join on the exchange rates which are ordered by date so that you have the start and the end date of each exchange rate, without any overlap or gap in the dates (maybe add that as view to your database - in my case I'm just using a common table expression).

Now joining those "prepared" rates with the transactions is simple and efficient.

Something like:

WITH IndexedExchangeRates AS (                        SELECT  Row_Number() OVER (ORDER BY Date) ix,                     Date,                     Rate              FROM    ExchangeRates          ),         RangedExchangeRates AS (                          SELECT  CASE WHEN IER.ix=1 THEN CAST('1753-01-01' AS datetime)                      ELSE IER.Date                      END DateFrom,                     COALESCE(IER2.Date, GETDATE()) DateTo,                     IER.Rate              FROM    IndexedExchangeRates IER              LEFT JOIN IndexedExchangeRates IER2              ON IER.ix = IER2.ix-1          ) SELECT  T.Date,         T.Amount,         RER.Rate,         T.Amount/RER.Rate ConvertedAmount  FROM    Transactions T  LEFT JOIN RangedExchangeRates RER  ON (T.Date > RER.DateFrom) AND (T.Date <= RER.DateTo) 

Notes:

  • You could replace GETDATE() with a date in the far future, I'm assuming here that no rates for the future are known.

  • Rule (B) is implemented by setting the date of the first known exchange rate to the minimal date supported by the SQL Server datetime, which should (by definition if it is the type you're using for the Date column) be the smallest value possible.

like image 167
Lucero Avatar answered Sep 29 '22 14:09

Lucero


Suppose you had an extended exchange rate table that contained:

 Start Date   End Date    Rate  ========== ========== =======  0001-01-01 2009-01-31    40.1  2009-02-01 2009-02-28    40.1  2009-03-01 2009-03-31    41.0  2009-04-01 2009-04-30    38.5  2009-05-01 9999-12-31    42.7 

We can discuss the details of whether the first two rows should be combined, but the general idea is that it is trivial to find the exchange rate for a given date. This structure works with the SQL 'BETWEEN' operator which includes the ends of the ranges. Often, a better format for ranges is 'open-closed'; the first date listed is included and the second is excluded. Note that there is a constraint on the data rows - there are (a) no gaps in the coverage of the range of dates and (b) no overlaps in the coverage. Enforcing those constraints is not completely trivial (polite understatement - meiosis).

Now the basic query is trivial, and Case B is no longer a special case:

SELECT T.Date, T.Amount, X.Rate   FROM Transactions AS T JOIN ExtendedExchangeRates AS X        ON T.Date BETWEEN X.StartDate AND X.EndDate; 

The tricky part is creating the ExtendedExchangeRate table from the given ExchangeRate table on the fly. If it is an option, then revising the structure of the basic ExchangeRate table to match the ExtendedExchangeRate table would be a good idea; you resolve the messy stuff when the data is entered (once a month) instead of every time an exchange rate needs to be determined (many times a day).

How to create the extended exchange rate table? If your system supports adding or subtracting 1 from a date value to obtain the next or previous day (and has a single row table called 'Dual'), then a variation on this will work (without using any OLAP functions):

CREATE TABLE ExchangeRate (     Date    DATE NOT NULL,     Rate    DECIMAL(10,5) NOT NULL ); INSERT INTO ExchangeRate VALUES('2009-02-01', 40.1); INSERT INTO ExchangeRate VALUES('2009-03-01', 41.0); INSERT INTO ExchangeRate VALUES('2009-04-01', 38.5); INSERT INTO ExchangeRate VALUES('2009-05-01', 42.7); 

First row:

SELECT '0001-01-01' AS StartDate,        (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,        (SELECT Rate FROM ExchangeRate          WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate FROM Dual; 

Result:

0001-01-01  2009-01-31      40.10000 

Last row:

SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,        '9999-12-31' AS EndDate,        (SELECT Rate FROM ExchangeRate          WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate FROM Dual; 

Result:

2009-05-01  9999-12-31      42.70000 

Middle rows:

SELECT X1.Date     AS StartDate,        X2.Date - 1 AS EndDate,        X1.Rate     AS Rate   FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2        ON X1.Date < X2.Date  WHERE NOT EXISTS        (SELECT *           FROM ExchangeRate AS X3          WHERE X3.Date > X1.Date AND X3.Date < X2.Date         ); 

Result:

2009-02-01  2009-02-28      40.10000 2009-03-01  2009-03-31      41.00000 2009-04-01  2009-04-30      38.50000 

Note that the NOT EXISTS sub-query is rather crucial. Without it, the 'middle rows' result is:

2009-02-01  2009-02-28      40.10000 2009-02-01  2009-03-31      40.10000    # Unwanted 2009-02-01  2009-04-30      40.10000    # Unwanted 2009-03-01  2009-03-31      41.00000 2009-03-01  2009-04-30      41.00000    # Unwanted 2009-04-01  2009-04-30      38.50000 

The number of unwanted rows increases dramatically as the table increases in size (for N > 2 rows, there are (N-2) * (N - 3) / 2 unwanted rows, I believe).

The result for ExtendedExchangeRate is the (disjoint) UNION of the three queries:

SELECT DATE '0001-01-01' AS StartDate,        (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,        (SELECT Rate FROM ExchangeRate          WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate FROM Dual UNION SELECT X1.Date     AS StartDate,        X2.Date - 1 AS EndDate,        X1.Rate     AS Rate   FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2        ON X1.Date < X2.Date  WHERE NOT EXISTS        (SELECT *           FROM ExchangeRate AS X3          WHERE X3.Date > X1.Date AND X3.Date < X2.Date         ) UNION SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,        DATE '9999-12-31' AS EndDate,        (SELECT Rate FROM ExchangeRate          WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate FROM Dual; 

On the test DBMS (IBM Informix Dynamic Server 11.50.FC6 on MacOS X 10.6.2), I was able to convert the query into a view but I had to stop cheating with the data types - by coercing the strings into dates:

CREATE VIEW ExtendedExchangeRate(StartDate, EndDate, Rate) AS     SELECT DATE('0001-01-01')  AS StartDate,            (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,            (SELECT Rate FROM ExchangeRate WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate     FROM Dual     UNION     SELECT X1.Date     AS StartDate,            X2.Date - 1 AS EndDate,            X1.Rate     AS Rate       FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2            ON X1.Date < X2.Date      WHERE NOT EXISTS            (SELECT *               FROM ExchangeRate AS X3              WHERE X3.Date > X1.Date AND X3.Date < X2.Date             )     UNION      SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,            DATE('9999-12-31') AS EndDate,            (SELECT Rate FROM ExchangeRate WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate     FROM Dual; 
like image 37
Jonathan Leffler Avatar answered Sep 29 '22 15:09

Jonathan Leffler