I have a table with 14,028 rows from November 2012. I also have a table with 13,959 rows from March 2013. I am using a simple NOT IN()
clause to see who has left:
select * from nov_2012 where id not in(select id from mar_2013)
This returned 396 rows and I never thought anything of it, until I went to analyze who left. When I pulled all the ids for the lost members and put them in a temp table (##lost
), 32 of them were actually still in the mar_2013
table. I can pull them up when I search for their ids using the following:
select * from mar_2013 where id in(select id from ##lost)
I can't figure out what is going on. I will mention that the id
field I created is an IDENTITY
column. Could that have any effect on the matching using NOT IN
? Is there a better way to check for missing rows between tables? I have also tried:
select a.* from nov_2012 a left join mar_2013 b on b.id = a.id where b.id is NULL
And received the same results.
This is how I created the identity field;
create table id_lookup( dateofcusttable date ,sin int ,sex varchar(12) ,scid int identity(777000,1))
insert into id_lookup (sin, sex) select distinct sin, sex from [Client Raw].dbo.cust20130331 where sin <> 0 order by sin, sex
This is how I added the scid into the march table:
select scid, rowno as custrowno
into scid_20130331
from [Client Raw].dbo.cust20130331 cust
left join id_lookup scid
on scid.sin = cust.sin
and scid.sex = cust.sex
update scid_20130331
set scid = custrowno where scid is NULL --for members who don't have more than one id or sin information is not available
drop table Account_Part2_Current
select a.*, scid
into Account_Part2_Current
from Account_Part1_Current a
left join scid_20130331 b
on b.custrowno = a.rowno_custdmd_cust
I then group all the information by the scid
Consider replacing the not equal operator with equals (=) or inequality operators (>,>=,<,<=) if possible.
A NOT IN query will not return any rows if any NULL s exists in the list of NOT IN values. You can explicitly exclude them using IS NOT NULL as below. Or rewrite using NOT EXISTS instead. As well as having the semantics that you want the execution plan for NOT EXISTS is often simpler as looked at here.
EXISTS is used to determine if any values are returned or not. Whereas, IN can be used as a multiple OR operator. If the sub-query result is large, then EXISTS is faster than IN. Once the single positive condition is met in the EXISTS condition then the SQL Engine will stop the process.
I would prefer this form (and here's why):
SELECT a.id --, other columns
FROM dbo.nov_2012 AS a
WHERE NOT EXISTS (SELECT 1 FROM dbo.mar_2013 WHERE id = a.id);
However this should still give the same results as what you've tried, so I suspect there is something about the data model that you're not telling us - for example, is mar_2013.id
nullable?
this is logically equivalent to not in and is faster than not in.
where yourfield in
(select afield
from somewhere
minus
select
thesamefield
where you want to exclude the record
)
It probably isn't as fast as using where not exists, as per Aaron's answer so you should only use it if not exists does not provide the results you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With