The join documentation for Hive encourages the use of implicit joins, i.e.
SELECT * 
FROM table1 t1, table2 t2, table3 t3 
WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535';
Is this equivalent to
SELECT t1.*, t2.*, t3.*
FROM table1 t1
INNER JOIN table2 t2 ON
  t1.id = t2.id 
INNER JOIN table3 t3 ON
  t2.id = t3.id
WHERE t1.zipcode = '02535'
, or will the above return additional records?
Not always. Your queries are equivalent. But without WHERE t1.id = t2.id AND t2.id = t3.id it will be CROSS JOIN.
Update:
This is interesting question and I decided to add some demo. Let's create two tables:
A(c1 int, c2 string) and B(c1 int, c2 string).
Load data:
insert into table A
 select 1, 'row one' union all
 select 2, 'row two';
insert into table B
 select 1, 'row one' union all
 select 3, 'row three';
Check data:
hive> select * from A;
OK
1       row one
2       row two
Time taken: 1.29 seconds, Fetched: 2 row(s)
hive> select * from B;
OK
1       row one
3       row three
Time taken: 0.091 seconds, Fetched: 2 row(s)
Check cross join (implicit join without where transformed to cross):
hive> select a.c1, a.c2, b.c1, b.c2 from a,b;
Warning: Map Join MAPJOIN[14][bigTable=a] in task 'Stage-3:MAPRED' is a cross product
Warning: Map Join MAPJOIN[22][bigTable=b] in task 'Stage-4:MAPRED' is a cross product
Warning: Shuffle Join JOIN[4][tables = [a, b]] in Stage 'Stage-1:MAPRED' is a cross product
OK
1       row one 1       row one
2       row two 1       row one
1       row one 3       row three
2       row two 3       row three
Time taken: 54.804 seconds, Fetched: 4 row(s)
Check inner join (implicit join with where works as INNER):
hive> select a.c1, a.c2, b.c1, b.c2 
        from a,b 
       where a.c1=b.c1;
OK
1       row one 1       row one
Time taken: 38.413 seconds, Fetched: 1 row(s)
Try to perform left join by adding OR b.c1 is null to the where:
hive> select a.c1, a.c2, b.c1, b.c2 
        from a,b 
       where (a.c1=b.c1) OR (b.c1 is null);
OK
1       row one 1       row one
Time taken: 57.317 seconds, Fetched: 1 row(s)
As you can see we got inner join again. or b.c1 is null is ignored
Now left join without where and ON clause (transformed to CROSS):
 select a.c1, a.c2, b.c1, b.c2 
   from a left join b;
OK
1       row one 1       row one
1       row one 3       row three
2       row two 1       row one
2       row two 3       row three
Time taken: 37.104 seconds, Fetched: 4 row(s)
As you can see we got cross again.
Try left join with where clause and without ON (works as INNER):
select a.c1, a.c2, b.c1, b.c2 
  from a left join b 
 where a.c1=b.c1;
OK
1       row one 1       row one
Time taken: 40.617 seconds, Fetched: 1 row(s)
We got INNER join
Try left join with where clause and without ON+ try to allow nulls:
 select a.c1, a.c2, b.c1, b.c2 
   from a 
   left join b 
  where a.c1=b.c1 
     or b.c1 is null;
OK
1       row one 1       row one
Time taken: 53.873 seconds, Fetched: 1 row(s)
Again got INNER.  or b.c1 is null is ignored.
Left join with on clause:
hive> select a.c1, a.c2, b.c1, b.c2 
        from a left join b on a.c1=b.c1;
OK
1       row one 1       row one
2       row two NULL    NULL
Time taken: 48.626 seconds, Fetched: 2 row(s)
Yes, it is true left join.
Left join with on + where (got INNER):
hive> select a.c1, a.c2, b.c1, b.c2 
        from a left join b on a.c1=b.c1 
       where a.c1=b.c1;
OK
1       row one 1       row one
Time taken: 49.54 seconds, Fetched: 1 row(s)
We got INNER because WHERE does not allow NULLS.
Left join with where + allow nulls:
hive> select a.c1, a.c2, b.c1, b.c2 
        from a left join b on a.c1=b.c1 
       where a.c1=b.c1 
          or b.c1 is null;
OK
1       row one 1       row one
2       row two NULL    NULL
Time taken: 55.951 seconds, Fetched: 2 row(s)
Yes, it is left join.
Conclusion:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With