Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server: row present in one query, missing in another

Ok so I think I must be misunderstanding something about SQL queries. This is a pretty wordy question, so thanks for taking the time to read it (my problem is right at the end, everything else is just context).

I am writing an accounting system that works on the double-entry principal -- money always moves between accounts, a transaction is 2 or more TransactionParts rows decrementing one account and incrementing another.

Some TransactionParts rows may be flagged as tax related so that the system can produce a report of total VAT sales/purchases etc, so it is possible that a single Transaction may have two TransactionParts referencing the same Account -- one VAT related, and the other not. To simplify presentation to the user, I have a view to combine multiple rows for the same account and transaction:

create view Accounting.CondensedEntryView as
select p.[Transaction], p.Account, sum(p.Amount) as Amount
    from Accounting.TransactionParts p 
    group by p.[Transaction], p.Account

I then have a view to calculate the running balance column, as follows:

create view Accounting.TransactionBalanceView as
with cte as
(
    select ROW_NUMBER() over (order by t.[Date]) AS RowNumber, 
                         t.ID as [Transaction], p.Amount, p.Account
        from Accounting.Transactions t
            inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID
)
select b.RowNumber, b.[Transaction], a.Account, 
                coalesce(sum(a.Amount), 0) as Balance
    from cte a, cte b
    where a.RowNumber <= b.RowNumber AND a.Account=b.Account
    group by b.RowNumber, b.[Transaction], a.Account

For reasons I haven't yet worked out, a certain transaction (ID=30) doesn't appear on an account statement for the user. I confirmed this by running

select * from Accounting.TransactionBalanceView where [Transaction]=30

This gave me the following result:

RowNumber            Transaction Account Balance
-------------------- ----------- ------- ---------------------
72                   30          23      143.80

As I said before, there should be at least two TransactionParts for each Transaction, so one of them isn't being presented in my view. I assumed there must be an issue with the way I've written my view, and run a query to see if there's anything else missing:

select [Transaction], count(*) 
    from Accounting.TransactionBalanceView 
    group by [Transaction] 
    having count(*) < 2

This query returns no results -- not even for Transaction 30! Thinking I must be an idiot I run the following query:

select [Transaction] 
    from Accounting.TransactionBalanceView 
    where [Transaction]=30

It returns two rows! So select * returns only one row and select [Transaction] returns both. After much head-scratching and re-running the last two queries, I concluded I don't have the faintest idea what's happening. Any ideas?

Thanks a lot if you've stuck with me this far!

Edit:

Here are the execution plans:

select *
select [Transaction]

1000 lines each, hence finding somewhere else to host.

Edit 2:

For completeness, here are the tables I used:

create table Accounting.Accounts
(
ID              smallint        identity primary key,
[Name]          varchar(50)     not null
    constraint UQ_AccountName unique,
[Type]          tinyint         not null
    constraint FK_AccountType foreign key references Accounting.AccountTypes
);

create table Accounting.Transactions
(
ID              int             identity primary key,
[Date]          date            not null default getdate(),
[Description]   varchar(50)     not null,
Reference       varchar(20)     not null default '',
Memo            varchar(1000)   not null
);

create table Accounting.TransactionParts
(
ID              int             identity primary key,
[Transaction]   int             not null
    constraint FK_TransactionPart foreign key references Accounting.Transactions,
Account         smallint        not null
    constraint FK_TransactionAccount foreign key references Accounting.Accounts,
Amount          money           not null,
VatRelated      bit             not null default 0
);
like image 773
Gordon Leigh Avatar asked Jul 29 '10 11:07

Gordon Leigh


2 Answers

Demonstration of possible explanation.

Create table Script

SELECT *
INTO #T
 FROM master.dbo.spt_values

 CREATE NONCLUSTERED INDEX [IX_T] ON  #T ([name] DESC,[number] DESC);

Query one (Returns 35 results)

WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
 FROM  #T
)
SELECT c1.number,c1.[type] 
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number

Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results) ;

WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
 FROM  #T
)
SELECT c1.number,c1.[type] ,c2.[type] 
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number

Why?

row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.

Execution Plan

Suggested Solution

You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])

Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.

What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)

like image 116
Martin Smith Avatar answered Oct 27 '22 08:10

Martin Smith


If the purpose of this part of the view is just to make sure that the same row isn't joined to itself

where a.RowNumber <= b.RowNumber

then how does changing this part to

where a.RowNumber <> b.RowNumber

affect the results?

like image 27
etoisarobot Avatar answered Oct 27 '22 07:10

etoisarobot