I have a many-to-many relationship between invoices and credit card transactions, which I'm trying to map sums of together. The best way to think of the problem is to imagine TransactionInvoiceMap as a bipartite graph. For each connected subgraph, find the total of all invoices and the total of all transactions within that subgraph. In my query, I want to return the values computed for each of these subgraphs along with the transaction ids they're associated with. Totals for related transactions should be identical.
More explicitly, given the following transactions/invoices
Table: TransactionInvoiceMap
TransactionID InvoiceID
1 1
2 2
3 2
3 3
Table: Transactions
TransactionID Amount
1 $100
2 $75
3 $75
Table: Invoices
InvoiceID Amount
1 $100
2 $100
3 $50
my desired output is
TransactionID TotalAsscTransactions TotalAsscInvoiced
1 $100 $100
2 $150 $150
3 $150 $150
Note that invoices 2 and 3 and transactions 2 and 3 are part of a logical group.
Here's a solution (simplified, names changed) that apparently works, but is very slow. I'm having a hard time figuring out how to optimize this, but I think it would involve eliminating the subqueries into TransactionInvoiceGrouping. Feel free to suggest something radically different.
with TransactionInvoiceGrouping as (
select
-- Need an identifier for each logical group of transactions/invoices, use
-- one of the transaction ids for this.
m.TransactionID,
m.InvoiceID,
min(m.TransactionID) over (partition by m.InvoiceID) as GroupingID
from TransactionInvoiceMap m
)
select distinct
g.TransactionID,
istat.InvoiceSum as TotalAsscInvoiced,
tstat.TransactionSum as TotalAsscTransactions
from TransactionInvoiceGrouping g
cross apply (
select sum(ii.Amount) as InvoiceSum
from (select distinct InvoiceID, GroupingID from TransactionInvoiceGrouping) ig
inner join Invoices ii on ig.InvoiceID = ii.InvoiceID
where ig.GroupingID = g.GroupingID
) as istat
cross apply (
select sum(it.Amount) as TransactionSum
from (select distinct TransactionID, GroupingID from TransactionInvoiceGrouping) ig
left join Transactions it on ig.TransactionID = it.TransactionID
where ig.GroupingID = g.GroupingID
having sum(it.Amount) > 0
) as tstat
I've implemented the solution in a recursive CTE:
;with TranGroup as (
select TransactionID
, InvoiceID as NextInvoice
, TransactionID as RelatedTransaction
, cast(TransactionID as varchar(8000)) as TransactionChain
from TransactionInvoiceMap
union all
select g.TransactionID
, m1.InvoiceID
, m.TransactionID
, g.TransactionChain + ',' + cast(m.TransactionID as varchar(11))
from TranGroup g
join TransactionInvoiceMap m on g.NextInvoice = m.InvoiceID
join TransactionInvoiceMap m1 on m.TransactionID = m1.TransactionID
where ',' + g.TransactionChain + ',' not like '%,' + cast(m.TransactionID as varchar(11)) + ',%'
)
, RelatedTrans as (
select distinct TransactionID, RelatedTransaction
from TranGroup
)
, RelatedInv as (
select distinct TransactionID, NextInvoice as RelatedInvoice
from TranGroup
)
select TransactionID
, (
select sum(Amount)
from Transactions
where TransactionID in (
select RelatedTransaction
from RelatedTrans
where TransactionID = t.TransactionID
)
) as TotalAsscTransactions
, (
select sum(Amount)
from Invoices
where InvoiceID in (
select RelatedInvoice
from RelatedInv
where TransactionID = t.TransactionID
)
) as TotalAsscInvoiced
from Transactions t
There is probably some room for optimization (including object naming on my part!) but I believe I have at least a correct solution which will gather all possible Transaction-Invoice relations to include in the calculations.
I was unable to get the existing solutions on this page to give the OP's desired output, and they got uglier as I added more test data. I'm not sure if the OP's posted "slow" solution is correct as stated. It's very possible that I'm misinterpreting the question.
Additional info:
I've often seen that recursive queries can be slow when working with large sets of data. Perhaps that can be the subject of another SO question. If that's the case, things to try on the SQL side might be to limit the range (add where
clauses), index base tables, select the CTE into a temp table first, index that temp table, think of a better stop condition for the CTE...but profile first, of course.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With