I have a Spark program that's reading from CSV files and loading them into Dataframes. Once loaded, I'm manipulating them using SparkSQL.
When running my Spark job, it fails and gives me the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve 'action' given input columns ["alpha", "beta", "gamma", "delta", "action"]
The exception above is thrown when SparkSQL tries parsing the following:
SELECT *,
IF(action = 'A', 1, 0) a_count,
IF(action = 'B', 1, 0) b_count,
IF(action = 'C', 1, 0) c_count,
IF(action = 'D', 1, 0) d_count,
IF(action = 'E', 1, 0) e_count
FROM my_table
This code worked fine before updating to Spark 2.0. Does anyone have any idea what would cause this issue?
Edit: I'm loading the CSV files using the Databricks CSV parser:
sqlContext.read().format("csv")
.option("header", "false")
.option("inferSchema", "false")
.option("parserLib", "univocity")
.load(pathToLoad);
Try adding backquotes to your selection.
SELECT *,
IF(`action` = 'A', 1, 0) a_count,
IF(`action` = 'B', 1, 0) b_count,
IF(`action` = 'C', 1, 0) c_count,
IF(`action` = 'D', 1, 0) d_count,
IF(`action` = 'E', 1, 0) e_count
FROM my_table
This applies to some databases like MySQL as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With