I am new to Hive. I have tried searching various websites but none were able to give me a clear picture of the following: A> Foreign keys: The general Hive concept never mentions anything about foreign keys. Then, how do we enforce referential constraints? (I am aware of JOIN ON syntax, so does that mean the two tables have a primary key:foreign key relationship?) Is there a higher purpose for not supporting foreign keys? B> Float equality comparison: There seems to be a problem with this. For instance, to check if A=3.5 => "A>3.49 and A<3.51". Is this the right way?
Are there any references/materials out there which could help in HQL implementation?
Appreciate any help,
Thanks -Shiree
Hive is implemented as Schema-on-Read, so there is no inherent referential integrity performed by Hive on datasets. Instead, integrity needs to be performed by the source system, and more importantly, by any queries that are executed in Hive.
Primary/foreign keys constraint support is available in Hive 2.1.0. See 2.1.0 release notes.
Hive does not currently support FK/PK constraints.
But it may be the case in the future. It gives Hive CBO more information to make better cardinality estimates, better query rewrites:
https://issues.apache.org/jira/browse/HIVE-13019
https://issues.apache.org/jira/browse/HIVE-6905
In response to Mo K's answer, constraints not necessarily mean overhead. Oracle for example has "RELY NOVALIDATE" constraints - so CBO (or Hive CBO in this case) relies on that constraint for its query optimizations, but does not have to actually check if that constraint is true.
Edit 02/18/2016: I've created https://issues.apache.org/jira/browse/HIVE-13076 please vote up if you're interested in that feature.
Edit 07/25/2016: https://issues.apache.org/jira/browse/HIVE-13076 is resolved as of 06/2016, should be landing in Hive 2.1. I don't see yet updates in official documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With