Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace IN clause with JOIN in Oracle SQL?

I'm trying to rewrite the below query to replace the 'IN' clause with inner join

select * from employee_rec er 
  inner join ed_claim_recd ed on er.ssn=ed.insssn and substr(er.group_rec_key,1,10) = substr(er.group_rec_key,1,10) 
    and ed.claim in (select claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101')
    and ed.insssn in (select er1.ssn from employee_rec er1 where er1.status != 'ACTIV' and trim(ER1.CLAIMNO) is null)
    and er.sysind not in ('ABC,'BCD')

Below is what I could come up with, but the results are not same as the previous query

select * from employee_rec er
  inner join ed_claim_recd ed on er.ssn = ed.insssn and substr(er.group_rec_key, 1, 10) = substr(er.group_rec_key, 1, 10)
  inner join (select claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101') cr on ed.claim = cr.claimno
  inner join (
    select insssn from ed_claim_recd ed2
      inner join (
        select ssn from employee_rec er1
        where
          er1.status != 'ACTIV'
          and trim(ER1.CLAIMNO) is null
      ) er2 on ed2.insssn = er2.ssn
  ) ed3 on ed.insssn = ed3.insssn
  and er.sysind not in ('ABC', 'BCD')

Is this the right approach to rewrite the query or am I way too off? Also, is it an efficient way to rewrite the query to replace "IN' with 'INNER JOIN'?

like image 885
zephyrus Avatar asked Mar 11 '26 23:03

zephyrus


2 Answers

IN subquery and INNER JOIN work differently. Join will output for each join key from one table all rows with the same key from joined table. So, Join can duplicate rows if a join key in the joined table is not unique. IN subquery will not duplicate rows.

For example if in your cr join subquery

inner join (select claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101') cr on ed.claim = cr.claimno

the claimno is not unique, then joined rows with matching claimno will be duplicated. This is quite normal Join behavior.

To avoid such duplication, make sure that join key is unique by adding DISTINCT, row_number() filter, group by, etc:

inner join (select DISTINCT claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101') cr on ed.claim = cr.claimno

And the same for other such joins.

In this case the result of IN and Join should be the same.

BTW you do not need all these conditions together:

where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101'

because '2130101' is greater than '0000000' and if cd.closedt >= '2130101' it cannot be NULL. cd.closedt >= '2130101' is quite enough.

Have found one more possible problem:

and trim(ER1.CLAIMNO) is null

In Hive (and you have tagged your question with @hive tag) empty string and null are two different things.

('' is not NULL) = true in Hive.

I'd suggest to replace it with and (ER.CLAIMNO is null or trim(ER1.CLAIMNO)='') Empty string is a normal value in Hive, that is why empty strings participate in joins. Convert to NULL or filter them before join if you do not need them to be joined.

ed3 subquery contains redundant join, it is not like original IN subquery.

Maybe there are other issues. Test all joins one by one to find all of them

like image 189
leftjoin Avatar answered Mar 14 '26 12:03

leftjoin


I make it this (untested):

select er.*, ed.*
from   employee_rec er 
       join ed_claim_recd ed
            on  ed.insssn = er.ssn
            and substr(ed.group_rec_key,1,10) /* was er.group_rec_key */ = substr(er.group_rec_key,1,10)
       join cd_claim_recd cd
            on  cd.claimno = ed.claim
       join employee_rec er1
            on  er1.ssn = ed.insssn
where  er.sysind not in ('ABC', 'BCD')
-- and    cd.closedt is not null   -- redundant
-- and    cd.closedt != '0000000'  -- redundant
and    cd.closedt >= '2130101'
and    er1.status != 'ACTIV'
and    trim(er1.claimno) is null

If cd_claim_recd.claimno and employee_rec.ssn and not unique keys then you may need some dedupe logic.

like image 35
William Robertson Avatar answered Mar 14 '26 12:03

William Robertson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!