Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ambiguous schema in Spark Scala

Schema:

|-- c0: string (nullable = true)
|-- c1: struct (nullable = true)
|    |-- c2: array (nullable = true)
|    |    |-- element: struct (containsNull = true)
|    |    |    |-- orangeID: string (nullable = true)
|    |    |    |-- orangeId: string (nullable = true)

I am trying to flatten the schema above in spark.

Code:

var df = data.select($"c0",$"c1.*").select($"c0",explode($"c2")).select($"c0",$"col.orangeID", $"col.orangeId")

The flattening code is working fine. The problem is in the last part where the 2 columns differ only by 1 letter (orangeID and orangeId). Hence I am getting this error:

Error:

org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(orangeID,StringType,true), StructField(orangeId,StringType,true);

Any suggestions to avoid this ambiguity will be great.

like image 680
data_person Avatar asked Aug 29 '18 21:08

data_person


People also ask

How does spark infer the schema?

Inferring the Schema Using Reflection The Scala interface for Spark SQL supports automatically converting an RDD containing case classes to a DataFrame. The case class defines the schema of the table. The names of the arguments to the case class are read using reflection and become the names of the columns.

How do I merge two spark data frames?

PySpark Join is used to combine two DataFrames and by chaining these you can join multiple DataFrames; it supports all basic join type operations available in traditional SQL like INNER , LEFT OUTER , RIGHT OUTER , LEFT ANTI , LEFT SEMI , CROSS , SELF JOIN.

What is schema in Scala?

Scala Schemas leverages Scala's class definition syntax, which includes the ability to specify defaults, along with Scala's implicit parameter resolution to safely interact with external protocols and systems. Currently supported systems are: Scalding Type Safe API: Parquet and Tuple Sources. Hive.


1 Answers

turn on the spark sql case sensitivity configuration and try

spark.sql("set spark.sql.caseSensitive=true")
like image 99
Chandan Ray Avatar answered Oct 08 '22 06:10

Chandan Ray