Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Last Access Time Update in Hive metastore

I am using the following property in my Hive console/ .hiverc file, so that whenever I query the table, it updates the LAST_ACCESS_TIME column in TBLS table of Hive metastore.

set hive.exec.pre.hooks = org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec;

However, if I use spark-sql or spark-shell, it does not seems to be working and LAST_ACCESS_TIME does not gets updated in hive metastore.

Here's how I am reading the table :

>>> df = spark.sql("select * from db.sometable")
>>> df.show()

I have set up the above hook in hive-site.xml in both /etc/hive/conf and /etc/spark/conf.

like image 402
Nitish Sharma Avatar asked Nov 07 '22 09:11

Nitish Sharma


1 Answers

Your code may skip past some of the hive integrations. My recollection is that to get more of the Hive-ish integrations you need to bring in the HiveContext, something like this:

from pyspark import SparkContext, SparkConf, HiveContext

if __name__ == "__main__":

  # create Spark context with Spark configuration
  conf = SparkConf().setAppName("Data Frame Join")
  sc = SparkContext(conf=conf)
  sqlContext = HiveContext(sc)
  df_07 = sqlContext.sql("SELECT * from sample_07")

https://docs.cloudera.com/runtime/7.2.7/developing-spark-applications/topics/spark-sql-example.html

Hope this helps

like image 133
Douglas M Avatar answered Nov 15 '22 11:11

Douglas M