Scenario:
Say there is a table in Hive, and it is queried using the below SparkSql in Apache Spark, where table name is passed as an argument and concatenated to the query.
In case of non-distributed system, I have basic understanding of SQL-Injection vulnerability and in the context of JDBC understand the usage of createStatement/preparedStatement in the those kind of scenario.
But what about this scenario in the case of sparksql, is this code vulnerable? Any insights ?
def main(args: Array[String]) {
val sconf = new SparkConf().setAppName("TestApp")
val sparkContext = new SparkContext(sconf)
val hiveSqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
val tableName = args(0) // passed as an argument
val tableData = hiveSqlContext.sql("select IdNUm, Name from hiveSchemaName." + tableName + " where IdNum <> '' ")
.map( x => (x.getString(0), x.getString(1)) ).collectAsMap()
................
...............
}
You can try the following in Spark 2.0:
def main(args: Array[String]) {
val conf = new SparkConf()
val sparkSession = SparkSession
.builder()
.appName("TestApp")
.config(conf)
.enableHiveSupport()
.getOrCreate()
val tableName = args(0) // passed as an argument
val tableData = sparkSession
.table(tableName)
.select($"IdNum", $"Name")
.filter($"IdNum" =!= "")
.map( x => (x.getString(0), x.getString(1)) ).collectAsMap()
................
...............
}`
In Java usually the most common way to handle sql injection threats is to use prepared statements.
you can use the Java libraries or google prepared statements in Scala to look for Scala libraries for that. Since Scala is used also in web applications I'm sure such libraries exists..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With