Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I efficiently create logical subsets of data in a many-to-many mapping table?

I have a many-to-many relationship between invoices and credit card transactions, which I'm trying to map sums of together. The best way to think of the problem is to imagine TransactionInvoiceMap as a bipartite graph. For each connected subgraph, find the total of all invoices and the total of all transactions within that subgraph. In my query, I want to return the values computed for each of these subgraphs along with the transaction ids they're associated with. Totals for related transactions should be identical.

More explicitly, given the following transactions/invoices

Table: TransactionInvoiceMap
TransactionID  InvoiceID
1              1
2              2
3              2
3              3

Table: Transactions
TransactionID  Amount
1              $100
2              $75
3              $75

Table: Invoices
InvoiceID  Amount
1          $100
2          $100
3          $50

my desired output is

TransactionID  TotalAsscTransactions TotalAsscInvoiced
1              $100                  $100
2              $150                  $150
3              $150                  $150

Note that invoices 2 and 3 and transactions 2 and 3 are part of a logical group.

Here's a solution (simplified, names changed) that apparently works, but is very slow. I'm having a hard time figuring out how to optimize this, but I think it would involve eliminating the subqueries into TransactionInvoiceGrouping. Feel free to suggest something radically different.

with TransactionInvoiceGrouping as (
    select 
        -- Need an identifier for each logical group of transactions/invoices, use
        -- one of the transaction ids for this.
        m.TransactionID,
        m.InvoiceID,
        min(m.TransactionID) over (partition by m.InvoiceID) as GroupingID
    from TransactionInvoiceMap m
)
select distinct
    g.TransactionID,
    istat.InvoiceSum as TotalAsscInvoiced,
    tstat.TransactionSum as TotalAsscTransactions
from TransactionInvoiceGrouping g
    cross apply (
        select sum(ii.Amount) as InvoiceSum
        from (select distinct InvoiceID, GroupingID from TransactionInvoiceGrouping) ig
            inner join Invoices ii on ig.InvoiceID = ii.InvoiceID
        where ig.GroupingID = g.GroupingID
    ) as istat
    cross apply (
        select sum(it.Amount) as TransactionSum
        from (select distinct TransactionID, GroupingID from TransactionInvoiceGrouping) ig
            left join Transactions it on ig.TransactionID = it.TransactionID
        where ig.GroupingID = g.GroupingID
        having sum(it.Amount) > 0
    ) as tstat
like image 505
Samantha Branham Avatar asked Jul 13 '12 18:07

Samantha Branham


1 Answers

I've implemented the solution in a recursive CTE:

;with TranGroup as (
    select TransactionID
        , InvoiceID as NextInvoice
        , TransactionID as RelatedTransaction
        , cast(TransactionID as varchar(8000)) as TransactionChain
    from TransactionInvoiceMap
    union all
    select g.TransactionID
        , m1.InvoiceID
        , m.TransactionID
        , g.TransactionChain + ',' + cast(m.TransactionID as varchar(11))
    from TranGroup g
        join TransactionInvoiceMap m on g.NextInvoice = m.InvoiceID
        join TransactionInvoiceMap m1 on m.TransactionID = m1.TransactionID
    where ',' + g.TransactionChain + ',' not like '%,' + cast(m.TransactionID as varchar(11)) + ',%'
)
, RelatedTrans as (
    select distinct TransactionID, RelatedTransaction
    from TranGroup
)
, RelatedInv as (
    select distinct TransactionID, NextInvoice as RelatedInvoice
    from TranGroup
)
select TransactionID
    , (
        select sum(Amount)
        from Transactions
        where TransactionID in (
            select RelatedTransaction
            from RelatedTrans
            where TransactionID = t.TransactionID
        )
    ) as TotalAsscTransactions
    , (
        select sum(Amount)
        from Invoices
        where InvoiceID in (
            select RelatedInvoice
            from RelatedInv
            where TransactionID = t.TransactionID
        )
    ) as TotalAsscInvoiced
from Transactions t

There is probably some room for optimization (including object naming on my part!) but I believe I have at least a correct solution which will gather all possible Transaction-Invoice relations to include in the calculations.

I was unable to get the existing solutions on this page to give the OP's desired output, and they got uglier as I added more test data. I'm not sure if the OP's posted "slow" solution is correct as stated. It's very possible that I'm misinterpreting the question.

Additional info:

I've often seen that recursive queries can be slow when working with large sets of data. Perhaps that can be the subject of another SO question. If that's the case, things to try on the SQL side might be to limit the range (add where clauses), index base tables, select the CTE into a temp table first, index that temp table, think of a better stop condition for the CTE...but profile first, of course.

like image 190
Tim Lehner Avatar answered Oct 23 '22 01:10

Tim Lehner