Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive foreign keys?

Tags:

hive

I am new to Hive. I have tried searching various websites but none were able to give me a clear picture of the following: A> Foreign keys: The general Hive concept never mentions anything about foreign keys. Then, how do we enforce referential constraints? (I am aware of JOIN ON syntax, so does that mean the two tables have a primary key:foreign key relationship?) Is there a higher purpose for not supporting foreign keys? B> Float equality comparison: There seems to be a problem with this. For instance, to check if A=3.5 => "A>3.49 and A<3.51". Is this the right way?

Are there any references/materials out there which could help in HQL implementation?

Appreciate any help,

Thanks -Shiree

like image 714
Shiree Avatar asked Mar 14 '12 05:03

Shiree


3 Answers

Hive is implemented as Schema-on-Read, so there is no inherent referential integrity performed by Hive on datasets. Instead, integrity needs to be performed by the source system, and more importantly, by any queries that are executed in Hive.

like image 150
Matt Tucker Avatar answered Oct 28 '22 10:10

Matt Tucker


Primary/foreign keys constraint support is available in Hive 2.1.0. See 2.1.0 release notes.

like image 21
plain_text Avatar answered Oct 28 '22 10:10

plain_text


Hive does not currently support FK/PK constraints.

But it may be the case in the future. It gives Hive CBO more information to make better cardinality estimates, better query rewrites:

https://issues.apache.org/jira/browse/HIVE-13019

https://issues.apache.org/jira/browse/HIVE-6905

In response to Mo K's answer, constraints not necessarily mean overhead. Oracle for example has "RELY NOVALIDATE" constraints - so CBO (or Hive CBO in this case) relies on that constraint for its query optimizations, but does not have to actually check if that constraint is true.

Edit 02/18/2016: I've created https://issues.apache.org/jira/browse/HIVE-13076 please vote up if you're interested in that feature.

Edit 07/25/2016: https://issues.apache.org/jira/browse/HIVE-13076 is resolved as of 06/2016, should be landing in Hive 2.1. I don't see yet updates in official documentation.

like image 21
Tagar Avatar answered Oct 28 '22 12:10

Tagar