I need to execute hive queries on remote hive server from spark, but for some reasons i receive only column names(without data). Data available in table, i checked it via HUE and java jdbc connection.
Here is my code example:
val test = spark.read
.option("url", "jdbc:hive2://remote.hive.server:10000/work_base")
.option("user", "user")
.option("password", "password")
.option("dbtable", "some_table_with_data")
.option("driver", "org.apache.hive.jdbc.HiveDriver")
.format("jdbc")
.load()
test.show()
Output:
+-------+
|dst.col|
+-------+
+-------+
I know that data vailable on this table.
Scala version: 2.11 Spark version: 2.1.0, i also tried 2.1.1 Hive version: CDH 5.7 Hive 1.1.1, on HDP i have same story Hive JDBC version: 1.1.1 i also tried later versions
But this problem available on Hive with later versions, too. Could you help me with this issue, because i didn't find anything in mail group answers and StackOverflow. Maybe you know how i can execute hive queries from spark to remote servers?
Paul Staab replied on this issue in Spark jira. Here is solution:
Create an Hive Dialect which uses the correct quotes for escaping the column names:
object HiveDialect extends JdbcDialect {
override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2")
override def quoteIdentifier(colName: String): String = s"`$colName`"
}
Register it before making the call with spark.read.jdbc
JdbcDialects.registerDialect(HiveDialect)
Execute spark.read.jdbc with fetchsize option
spark.read.jdbc("jdbc:hive2://localhost:10000/default","test1",properties={"driver": "org.apache.hive.jdbc.HiveDriver", "fetchsize": "10"}).show()
You should add this into your options:
.option("fetchsize", "10")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With