I'm trying to read from SAS IOM using Spark JDBC. Problem is that the SAS JDBC driver is a bit strange, so i need to create my own dialect:
object SasDialect extends JdbcDialect {
override def canHandle(url: String): Boolean = url.startsWith("jdbc:sasiom")
override def quoteIdentifier(colName: String): String = "\"" + colName + "\"n"
}
however , this is not enough. SAS makes a distinction between column labels (= human readable names) and column names (= names you use in a SQL query), but it seems that spark uses the column labels instead of the names in schema discovery, see JdbcUtils extract below:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L293
while (i < ncols) {
val columnName = rsmd.getColumnLabel(i + 1)
this results in SQL errors because it tries to use the human readable column name in generated SQL code.
For SAS IOM JDBC to work, this would need to be the getColumnName instead of the getColumnLabel. Is there a way to specify this in the dialect ? I could not really find a way to hook into this, apart from wrapping the whole com.sas.rio.MVADriver and the resultsetmeta
Frank
In the mean time i found how to do it, so just posting for reference. The trick is to register your own dialect, like below.
Also, SAS pads all varchar columns with spaces, so i trim all string columns.
def getSasTable(sparkSession: org.apache.spark.sql.SparkSession, tablename: String): org.apache.spark.sql.DataFrame = {
val host : String = "dwhid94.msnet.railb.be";
val port : String = "48593";
val props = new java.util.Properties();
props.put("user", CredentialsStore.getUsername("sas"))
props.put("password", CredentialsStore.getPassword("sas"))
props.setProperty("driver", "com.sas.rio.MVADriver")
val sasconurl : String = String.format("jdbc:sasiom://%s:%s", host, port);
object SasDialect extends JdbcDialect {
override def canHandle(url: String): Boolean = url.startsWith("jdbc:sasiom")
override def quoteIdentifier(colName: String): String = "\"" + colName + "\"n"
}
JdbcDialects.registerDialect(SasDialect)
val df = sparkSession.read
.option("url", sasconurl)
.option("driver", "com.sas.rio.MVADriver")
.option("dbtable", tablename)
.option("user",CredentialsStore.getUsername("sas"))
.option("password",CredentialsStore.getPassword("sas"))
.option("fetchsize",100)
.format("jdbc")
.load()
val strippedDf = sparkSession.createDataFrame(df.rdd.map(r => Row(r.toSeq.map(x => x match {case s: String => s.trim; case _ => x}): _*)), df.schema);
return strippedDf;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With