Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join two tables on the same date or closest date (before or after)

I just got great help from Gord Thompson on a similar question (Combine two tables by joining on the same date or closest prior date (not just exact matches)) but now realize my data is not what I expected. It turns out I can have Lead_Dates later than Product_Interest_Dates and this is causing the previous SQL code to drop those cases. More specifically:

I have two tables:

  • CustomerID
  • Lead_Date
  • Lead_Source

and

  • CustomerID
  • Product_Interest_Date
  • Product_Interest

I want two create a single table where, for each CustomerID, each Product_Interest is connected to a the Lead_Source that is the closest date (either before or after). The final table would be:

CustomerID
Product_Interest_Date
Product_Interest
Lead_Date (the closest entry in time to Product_Interest_Date)
Lead_Source (the Lead_Source of the closest Lead_Date)

I studied Gord's code but cannot bring this home. Following his example, graphically I want this: http://i.stack.imgur.com/4ZVDV.jpg

The SQL for the sequence Stack Overflow NEW 1

SELECT
pi.CustomerID, 
pi.Product_Interest_Date,
l.Lead_Date, 
Abs(pi.Product_Interest_Date-l.Lead_Date) AS Date_Gap

FROM 
Test_PI pi
INNER JOIN 
Test_Leads l

Stack Overflow NEW 2

SELECT 
[Stack Overflow NEW 1].CustomerID,
[Stack Overflow NEW 1].Product_Interest_Date, 
Min([Stack Overflow NEW 1].Date_Gap) AS MinOfDate_Gap
FROM [Stack Overflow NEW 1]
GROUP BY [Stack Overflow NEW 1].CustomerID, 
[Stack Overflow NEW 1].Product_Interest_Date;

Final

SELECT Test_PI.CustomerID, 
       Test_PI.Product_Interest_Date, 
       Test_PI.Product_Interest, 
       Test_Leads.Lead_Date, 
       Test_Leads.Lead_Source
FROM (Test_PI INNER JOIN ([Stack Overflow NEW 2] 
 INNER JOIN [Stack Overflow NEW 1] 
   ON ([Stack Overflow NEW 2].CustomerID = [Stack Overflow NEW 1].CustomerID) 
    AND ([Stack Overflow NEW 2].Product_Interest_Date = [Stack Overflow NEW 1].Product_Interest_Date) 
    AND ([Stack Overflow NEW 2].MinOfDate_Gap = [Stack Overflow NEW 1].Date_Gap)) 
  ON (Test_PI.CustomerID = [Stack Overflow NEW 2].CustomerID) 
   AND (Test_PI.Product_Interest_Date = [Stack Overflow NEW 2].Product_Interest_Date))
 INNER JOIN Test_Leads 
 ON ([Stack Overflow NEW 1].CustomerID = Test_Leads.CustomerID) 
  AND ([Stack Overflow NEW 1].Lead_Date = Test_Leads.Lead_Date)
GROUP BY Test_PI.CustomerID, Test_PI.Product_Interest_Date, Test_PI.Product_Interest, Test_Leads.Lead_Date, Test_Leads.Lead_Source;

I tried to combine all these into a single code, and cannot get past the SQL FROM error! This is my specific question, how do I write this in a single SQL code?

SELECT 
Test_PI.CustomerID,
Test_PI.Product_Interest_Date,
Test_PI.Product_Interest,
Test_Leads.Lead_Date,
Test_Leads.Lead_Source
FROM
(Test_PI 
INNER JOIN
   ( (SELECT 
             latest.CustomerID, 
             latest.Product_Interest_Date, 
             Min(latest.Date_Gap) AS Min_Date_Gap
      FROM 
           latest 
     ) latest1
INNER JOIN
  (SELECT 
             pi.CustomerID, 
             pi.Product_Interest_Date, 
             l.Lead_Date,
             Abs(pi.Product_Interest_Date - l.Lead_Date) AS Date_Gap
      FROM 
            Test_PI pi 
       INNER JOIN 
            Test_Leads l
       ON pi.CustomerID = l.CustomerID 
    ) latest
   ) 
 ON Test_PI.CustomerID = latest1.CustomerID AND Test_PI.Product_Interest_Date = latest1.Product_Interest_Date

INNER JOIN 
  Test_Leads
    ON Test_Leads.CustomerID = latest1.CustomerID
        AND Test_Leads.Lead_Date = latest1.Lead_Date
like image 784
scminne Avatar asked Oct 16 '14 23:10

scminne


People also ask

What is the most efficient way of joining 2 table in same database?

Method 1: Relational Algebra Relational algebra is the most common way of writing a query and also the most natural way to do so. The code is clean, easy to troubleshoot, and unsurprisingly, it is also the most efficient way to join two tables.

Does WHERE clause go before or after join?

If you move the same filter to the WHERE clause, you will notice that the filter happens after the tables are joined.

Does it matter what order you join tables in?

1 Answer. The order doesn't matter for INNER joins. As long as you change your selects from SELECT * to SELECT a.

How do you join two tables based on conditions?

You join two tables by creating a relationship in the WHERE clause between at least one column from one table and at least one column from another. The join creates a temporary composite table where each pair of rows (one from each table) that satisfies the join condition is linked to form a single row.


1 Answers

Now that we're considering both past and future [Lead_Date] values I've tweaked the test data cover a special case

Table: Test_PI

CustomerID  Product_Interest_Date  Product_Interest
----------  ---------------------  ----------------
         1  2014-09-07             Interest1       
         1  2014-09-08             Interest2       
         1  2014-09-15             Interest3       
         1  2014-09-28             Interest4       

Table: Test_Leads

CustomerID  Lead_Date   Lead_Source
----------  ----------  -----------
         1  2014-09-07  Source1    
         1  2014-09-14  Source2    
         2  2014-09-15  Source3    
         1  2014-09-16  Source4    

We'll start by creating a saved Access query named [Date_Gaps]

SELECT
    pi.CustomerID, 
    pi.Product_Interest_Date,
    l.Lead_Date,
    Abs(DateDiff("d", pi.Product_Interest_Date, l.Lead_Date)) AS Date_Gap
FROM 
    Test_PI pi
    INNER JOIN 
    Test_Leads l
        ON pi.CustomerID = l.CustomerID

returning

CustomerID  Product_Interest_Date  Lead_Date   Date_Gap
----------  ---------------------  ----------  --------
         1  2014-09-07             2014-09-07         0
         1  2014-09-08             2014-09-07         1
         1  2014-09-15             2014-09-07         8
         1  2014-09-28             2014-09-07        21
         1  2014-09-07             2014-09-14         7
         1  2014-09-08             2014-09-14         6
         1  2014-09-15             2014-09-14         1
         1  2014-09-28             2014-09-14        14
         1  2014-09-07             2014-09-16         9
         1  2014-09-08             2014-09-16         8
         1  2014-09-15             2014-09-16         1
         1  2014-09-28             2014-09-16        12

Now the query

SELECT 
    CustomerID,
    Product_Interest_Date,
    Min(Date_Gap) AS MinOfDate_Gap
FROM Date_Gaps
GROUP BY
    CustomerID, 
    Product_Interest_Date

returns

CustomerID  Product_Interest_Date  MinOfDate_Gap
----------  ---------------------  -------------
         1  2014-09-07                         0
         1  2014-09-08                         1
         1  2014-09-15                         1
         1  2014-09-28                        12

so if we simply join back into the [Date_Gaps] query to get the [Lead_Date]

SELECT
    mingap.CustomerID,
    mingap.Product_Interest_Date,
    Date_Gaps.Lead_Date
FROM
    Date_Gaps
    INNER JOIN
    (
        SELECT 
            CustomerID,
            Product_Interest_Date,
            Min(Date_Gap) AS MinOfDate_Gap
        FROM Date_Gaps
        GROUP BY
            CustomerID, 
            Product_Interest_Date
    ) mingap
        ON Date_Gaps.CustomerID = mingap.CustomerID
            AND Date_Gaps.Product_Interest_Date = mingap.Product_Interest_Date
            AND Date_Gaps.Date_Gap = mingap.MinOfDate_Gap

we get

CustomerID  Product_Interest_Date  Lead_Date 
----------  ---------------------  ----------
         1  2014-09-07             2014-09-07
         1  2014-09-08             2014-09-07
         1  2014-09-15             2014-09-14
         1  2014-09-15             2014-09-16
         1  2014-09-28             2014-09-16

Notice that we get two hits for 09-15 because they both have a gap of 1 day (before and after). So, we need to break that tie by wrapping the above query in an aggregation query using Min(Lead_Date) (or Max(Lead_Date), your choice)

SELECT
    CustomerID,
    Product_Interest_Date,
    Min(Lead_Date) AS MinOfLead_Date
FROM
    (
        SELECT
            mingap.CustomerID,
            mingap.Product_Interest_Date,
            Date_Gaps.Lead_Date
        FROM
            Date_Gaps
            INNER JOIN
            (
                SELECT 
                    CustomerID,
                    Product_Interest_Date,
                    Min(Date_Gap) AS MinOfDate_Gap
                FROM Date_Gaps
                GROUP BY
                    CustomerID, 
                    Product_Interest_Date
            ) mingap
                ON Date_Gaps.CustomerID = mingap.CustomerID
                    AND Date_Gaps.Product_Interest_Date = mingap.Product_Interest_Date
                    AND Date_Gaps.Date_Gap = mingap.MinOfDate_Gap
    )
GROUP BY
    CustomerID,
    Product_Interest_Date

to give us

CustomerID  Product_Interest_Date  MinOfLead_Date
----------  ---------------------  --------------
         1  2014-09-07             2014-09-07    
         1  2014-09-08             2014-09-07    
         1  2014-09-15             2014-09-14    
         1  2014-09-28             2014-09-16    

So now we're ready to JOIN up the original tables

SELECT 
    Test_PI.CustomerID,
    Test_PI.Product_Interest_Date,
    Test_PI.Product_Interest,
    Test_Leads.Lead_Date,
    Test_Leads.Lead_Source
FROM
    (
        Test_PI
        INNER JOIN
        (
            SELECT
                CustomerID,
                Product_Interest_Date,
                Min(Lead_Date) AS MinOfLead_Date
            FROM
                (
                    SELECT
                        mingap.CustomerID,
                        mingap.Product_Interest_Date,
                        Date_Gaps.Lead_Date
                    FROM
                        Date_Gaps
                        INNER JOIN
                        (
                            SELECT 
                                CustomerID,
                                Product_Interest_Date,
                                Min(Date_Gap) AS MinOfDate_Gap
                            FROM Date_Gaps
                            GROUP BY
                                CustomerID, 
                                Product_Interest_Date
                        ) mingap
                            ON Date_Gaps.CustomerID = mingap.CustomerID
                                AND Date_Gaps.Product_Interest_Date = mingap.Product_Interest_Date
                                AND Date_Gaps.Date_Gap = mingap.MinOfDate_Gap
                )
            GROUP BY
                CustomerID,
                Product_Interest_Date
        ) closest
            ON Test_PI.CustomerID = closest.CustomerID
                AND Test_PI.Product_Interest_Date = closest.Product_Interest_Date
    )
    INNER JOIN
    Test_Leads
        ON Test_Leads.CustomerID = closest.CustomerID
            AND Test_Leads.Lead_Date = closest.MinOfLead_Date

returning

CustomerID  Product_Interest_Date  Product_Interest  Lead_Date   Lead_Source
----------  ---------------------  ----------------  ----------  -----------
         1  2014-09-07             Interest1         2014-09-07  Source1    
         1  2014-09-08             Interest2         2014-09-07  Source1    
         1  2014-09-15             Interest3         2014-09-14  Source2    
         1  2014-09-28             Interest4         2014-09-16  Source4    
like image 147
Gord Thompson Avatar answered Oct 15 '22 16:10

Gord Thompson