I'm working with the Databricks spark-csv package (via Scala API), and having problems defining a custom schema.
After starting up the console with
spark-shell --packages com.databricks:spark-csv_2.11:1.2.0
I import my necessary types
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
and then simply try to define this schema:
val customSchema = StructType(
StructField("user_id", IntegerType, true),
StructField("item_id", IntegerType, true),
StructField("artist_id", IntegerType, true),
StructField("scrobble_time", StringType, true))
but I receive the following error:
<console>:26: error: overloaded method value apply with alternatives:
(fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and>
(fields: java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and>
(fields: Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType
cannot be applied to (org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField)
val customSchema = StructType(
I'm very new to scala, so having trouble parsing this, but what am I doing wrong here? I'm following the very simple example here.
You need to pass your set of StructField
's as a Seq
.
Something like any of the following works:
val customSchema = StructType(Seq(StructField("user_id", IntegerType, true), StructField("item_id", IntegerType, true), StructField("artist_id", IntegerType, true), StructField("scrobble_time", StringType, true)))
val customSchema = (new StructType)
.add("user_id", IntegerType, true)
.add("item_id", IntegerType, true)
.add("artist_id", IntegerType, true)
.add("scrobble_time", StringType, true)
val customSchema = StructType(StructField("user_id", IntegerType, true) :: StructField("item_id", IntegerType, true) :: StructField("artist_id", IntegerType, true) :: StructField("scrobble_time", StringType, true) :: Nil)
I'm not sure why it's not presented as this on the README, but if you check the StructType
documentation, it's clear about this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With