Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

org.apache.spark.sql.AnalysisException: cannot resolve given input column

I have a Spark program that's reading from CSV files and loading them into Dataframes. Once loaded, I'm manipulating them using SparkSQL.

When running my Spark job, it fails and gives me the following exception:

org.apache.spark.sql.AnalysisException: cannot resolve 'action' given input columns ["alpha", "beta", "gamma", "delta", "action"]

The exception above is thrown when SparkSQL tries parsing the following:

SELECT *, 
  IF(action = 'A', 1, 0) a_count,
  IF(action = 'B', 1, 0) b_count,
  IF(action = 'C', 1, 0) c_count,
  IF(action = 'D', 1, 0) d_count,
  IF(action = 'E', 1, 0) e_count
FROM my_table

This code worked fine before updating to Spark 2.0. Does anyone have any idea what would cause this issue?

Edit: I'm loading the CSV files using the Databricks CSV parser:

sqlContext.read().format("csv")
    .option("header", "false")
    .option("inferSchema", "false")
    .option("parserLib", "univocity")
    .load(pathToLoad);
like image 445
dmux Avatar asked Feb 25 '26 16:02

dmux


1 Answers

Try adding backquotes to your selection.

SELECT *, 
  IF(`action` = 'A', 1, 0) a_count,
  IF(`action` = 'B', 1, 0) b_count,
  IF(`action` = 'C', 1, 0) c_count,
  IF(`action` = 'D', 1, 0) d_count,
  IF(`action` = 'E', 1, 0) e_count
FROM my_table

This applies to some databases like MySQL as well.

like image 100
xmar Avatar answered Mar 03 '26 05:03

xmar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!