Background:-
Steps Followed:-
Create a hive table:-
hive>
CREATE TABLE orctest(PROD_ID bigint, CUST_ID bigint, TIME_ID timestamp, CHANNEL_ID bigint, PROMO_ID bigint, QUANTITY_SOLD decimal(10,0), AMOUNT_SOLD decimal(10,0) ) CLUSTERED BY (PROD_ID) INTO 32 BUCKETS STORED AS ORC TBLPROPERTIES ( "orc.compress"="SNAPPY", "transactional"="true" );
Insert records into orctest
hive>
insert into orctest values(1, 1, '2016-08-02 21:36:54.000000000', 1, 1, 10, 10000);
Try to access the orctest table from spark-shell
scala>
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val s = hiveContext.table("orctest")*
Exception Thrown:-
16/08/02 22:06:54 INFO OrcRelation: Listing hdfs://hadoop03:8020/apps/hive/warehouse/orctest on driver
16/08/02 22:06:54
INFO OrcRelation: Listing hdfs://hadoop03:8020/apps/hive/warehouse/orctest/delta_0000005_0000005 on driver
**java.lang.AssertionError: assertion failed**
at scala.Predef$.assert(Predef.scala:165)
at org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:39)
at org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:38)
at scala.Option.map(Option.scala:145)
at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:38)
at org.apache.spark.sql.execution.datasources.LogicalRelation.copy(LogicalRelation.scala:31)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$convertToOrcRelation(HiveMetastoreCatalog.scala:588)
Any help will be really appreciated.
Spark SQL supports queries that are written using HiveQL, a SQL-like language that produces queries that are converted to Spark jobs. The Spark DataFrame API encapsulates data sources, including DataStax Enterprise data, organized into named columns.
to connect to hive metastore you need to copy the hive-site. xml file into spark/conf directory. After that spark will be able to connect to hive metastore.
Spark SQL is designed to be compatible with the Hive Metastore, SerDes and UDFs. Currently, Hive SerDes and UDFs are based on Hive 1.2. 1, and Spark SQL can be connected to different versions of Hive Metastore (from 0.12. 0 to 2.3.
Try setting : hiveContext.setConf("spark.sql.hive.convertMetastoreOrc", "false")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With