Integrating Spark SQL and Apache Drill through JDBC

Tags:

I would like to create a Spark SQL DataFrame from the results of a query performed over CSV data (on HDFS) with Apache Drill. I successfully configured Spark SQL to make it connect to Drill via JDBC:

Map<String, String> connectionOptions = new HashMap<String, String>();
connectionOptions.put("url", args[0]);
connectionOptions.put("dbtable", args[1]);
connectionOptions.put("driver", "org.apache.drill.jdbc.Driver");

DataFrame logs = sqlc.read().format("jdbc").options(connectionOptions).load();

Spark SQL performs two queries: the first one to get the schema, and the second one to retrieve the actual data:

SELECT * FROM (SELECT * FROM dfs.output.`my_view`) WHERE 1=0

SELECT "field1","field2","field3" FROM (SELECT * FROM dfs.output.`my_view`)

The first one is successful, but in the second one Spark encloses fields within double quotes, which is something that Drill doesn't support, so the query fails.

Did someone managed to get this integration working?

Thank you!

771

asked Feb 18 '16 08:02

Skice

1 Answers

you can add JDBC Dialect for this and register the dialect before using jdbc connector

case object DrillDialect extends JdbcDialect {

  def canHandle(url: String): Boolean = url.startsWith("jdbc:drill:")

  override def quoteIdentifier(colName: java.lang.String): java.lang.String = {
    return colName
  }

  def instance = this
}

JdbcDialects.registerDialect(DrillDialect)

answered Oct 08 '22 20:10

zvee

Related questions
                            
                                Hadoop mapreduce streaming from HBase
                            
                                Repository organization for Hadoop project
                            
                                HDFS says file is still open, but process writing to it was killed
                            
                                MDX support for Hive (Hadoop)
                            
                                Convert DataInput to DataInputStream?
                            
                                cluster genetic programming/algorithms
                            
                                Efficiently Storing the data in Hive
                            
                                Writing to a file in HDFS in Hadoop
                            
                                Hadoop/YARN job FAILED - "exited with exitCode: -1000 due to: Could not find any valid local directory for nmPrivate..."
                            
                                Hiveserver2 cannot fetch result of a query from remote connection
                            
                                Hadoop 2.3.0 wordcount runs forever
                            
                                steps to replace a hadoop namenodes and journal nodes
                            
                                Workflow error logs disabled in Oozie 4.2
                            
                                Spark Swift Integration Parquet

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Integrating Spark SQL and Apache Drill through JDBC

Tags:

jdbc

apache-spark

apache-spark-sql

hadoop

apache-drill

Skice

People also ask

1 Answers

zvee

Recent Activity

Donate For Us