I'm trying to switch from reading csv flat files to avro files on spark. following https://github.com/databricks/spark-avro I use:
import com.databricks.spark.avro._
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.avro("gs://logs.xyz.com/raw/2016/04/20/div1/div2/2016-04-20-08-28-35.UTC.blah-blah.avro")
and get
java.lang.UnsupportedOperationException: This mix of union types is not supported (see README): ArrayBuffer(STRING)
the readme file states clearly:
This library supports reading all Avro types, with the exception of complex union types. It uses the following mapping from Avro types to Spark SQL types:
when i try to textread the same file I can see the schema
val df = sc.textFile("gs://logs.xyz.com/raw/2016/04/20/div1/div2/2016-04-20-08-28-35.UTC.blah-blah.avro")
df.take(2).foreach(println)
{"name":"log_record","type":"record","fields":[{"name":"request","type":{"type":"record","name":"request_data","fields":[{"name":"datetime","type":"string"},{"name":"ip","type":"string"},{"name":"host","type":"string"},{"name":"uri","type":"string"},{"name":"request_uri","type":"string"},{"name":"referer","type":"string"},{"name":"useragent","type":"string"}]}}
<------- an excerpt of the full reply ------->
since I have little control on the format I'm getting these files in, my question here is - is there a workaround someone tested and can recommend?
I use gc dataproc with
MASTER=yarn-cluster spark-shell --num-executors 4 --executor-memory 4G --executor-cores 4 --packages com.databricks:spark-avro_2.10:2.0.1,com.databricks:spark-csv_2.11:1.3.0
any help would be greatly appreciated.....
You won't find any solution that works with Spark SQL. Every column in Spark has to contain values which can be represented as a single DataType
so complex union types are simply not representable with Spark Dataframe
.
If you want to read data like this you should use RDD API and convert loaded data to DataFrame
later.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With