I am using the following property in my Hive console/ .hiverc
file, so that whenever I query the table, it updates the LAST_ACCESS_TIME
column in TBLS
table of Hive metastore.
set hive.exec.pre.hooks = org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec;
However, if I use spark-sql
or spark-shell
, it does not seems to be working and LAST_ACCESS_TIME
does not gets updated in hive metastore.
Here's how I am reading the table :
>>> df = spark.sql("select * from db.sometable")
>>> df.show()
I have set up the above hook in hive-site.xml
in both /etc/hive/conf
and /etc/spark/conf
.
Your code may skip past some of the hive integrations. My recollection is that to get more of the Hive-ish integrations you need to bring in the HiveContext, something like this:
from pyspark import SparkContext, SparkConf, HiveContext
if __name__ == "__main__":
# create Spark context with Spark configuration
conf = SparkConf().setAppName("Data Frame Join")
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)
df_07 = sqlContext.sql("SELECT * from sample_07")
https://docs.cloudera.com/runtime/7.2.7/developing-spark-applications/topics/spark-sql-example.html
Hope this helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With