Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read data from remote hive on spark over JDBC returns empty result

I need to execute hive queries on remote hive server from spark, but for some reasons i receive only column names(without data). Data available in table, i checked it via HUE and java jdbc connection.

Here is my code example:

val test = spark.read
    .option("url", "jdbc:hive2://remote.hive.server:10000/work_base")
    .option("user", "user")
    .option("password", "password")
    .option("dbtable", "some_table_with_data")
    .option("driver", "org.apache.hive.jdbc.HiveDriver")
    .format("jdbc")
    .load()
test.show()

Output:

+-------+
|dst.col|
+-------+
+-------+

I know that data vailable on this table.

Scala version: 2.11 Spark version: 2.1.0, i also tried 2.1.1 Hive version: CDH 5.7 Hive 1.1.1, on HDP i have same story Hive JDBC version: 1.1.1 i also tried later versions

But this problem available on Hive with later versions, too. Could you help me with this issue, because i didn't find anything in mail group answers and StackOverflow. Maybe you know how i can execute hive queries from spark to remote servers?

like image 953
userxuser Avatar asked Jun 08 '17 11:06

userxuser


Video Answer


2 Answers

Paul Staab replied on this issue in Spark jira. Here is solution:

  1. Create an Hive Dialect which uses the correct quotes for escaping the column names:

    object HiveDialect extends JdbcDialect {
    
        override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2")
    
        override def quoteIdentifier(colName: String): String = s"`$colName`"
    }
    
  2. Register it before making the call with spark.read.jdbc

    JdbcDialects.registerDialect(HiveDialect)
    
  3. Execute spark.read.jdbc with fetchsize option

    spark.read.jdbc("jdbc:hive2://localhost:10000/default","test1",properties={"driver": "org.apache.hive.jdbc.HiveDriver", "fetchsize": "10"}).show()
    
like image 66
userxuser Avatar answered Oct 21 '22 08:10

userxuser


You should add this into your options:

 .option("fetchsize", "10")
like image 25
Feiran Avatar answered Oct 21 '22 09:10

Feiran