Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enable case sensitivity for spark.sql globally

The option spark.sql.caseSensitive controls whether column names etc should be case sensitive or not. It can be set e.g. by

spark_session.sql('set spark.sql.caseSensitive=true')

and is false per default.

It does not seem to be possible to enable it globally in $SPARK_HOME/conf/spark-defaults.conf with

spark.sql.caseSensitive: True

though. Is that intended or is there some other file to set sql options?

Also in the source it is stated that it is highly discouraged to enable this at all. What is the rationale behind that advice?

like image 804
karlson Avatar asked Mar 22 '17 08:03

karlson


3 Answers

Yet another way for PySpark. Using a SparkSession object named spark:

spark.conf.set('spark.sql.caseSensitive', True)
like image 140
Ankur Avatar answered Oct 09 '22 06:10

Ankur


As it turns out setting

spark.sql.caseSensitive: True

in $SPARK_HOME/conf/spark-defaults.conf DOES work after all. It just has to be done in the configuration of the Spark driver as well, not the master or workers. Apparently I forgot that when I last tried.

like image 18
karlson Avatar answered Oct 09 '22 06:10

karlson


Try sqlContext.sql("set spark.sql.caseSensitive=true") in your Python code, which worked for me.

like image 3
Jie Avatar answered Oct 09 '22 07:10

Jie