Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

overloaded method error using spark-csv

I'm working with the Databricks spark-csv package (via Scala API), and having problems defining a custom schema.

After starting up the console with

spark-shell  --packages com.databricks:spark-csv_2.11:1.2.0

I import my necessary types

import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}

and then simply try to define this schema:

val customSchema = StructType(
    StructField("user_id", IntegerType, true),
    StructField("item_id", IntegerType, true),
    StructField("artist_id", IntegerType, true),
    StructField("scrobble_time", StringType, true))

but I receive the following error:

<console>:26: error: overloaded method value apply with alternatives:
  (fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and>
  (fields: java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and>
  (fields: Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType
 cannot be applied to (org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField)
       val customSchema = StructType(

I'm very new to scala, so having trouble parsing this, but what am I doing wrong here? I'm following the very simple example here.

like image 852
moustachio Avatar asked Dec 18 '22 22:12

moustachio


1 Answers

You need to pass your set of StructField's as a Seq.

Something like any of the following works:

val customSchema = StructType(Seq(StructField("user_id", IntegerType, true), StructField("item_id", IntegerType, true), StructField("artist_id", IntegerType, true), StructField("scrobble_time", StringType, true)))

val customSchema = (new StructType)
  .add("user_id", IntegerType, true)
  .add("item_id", IntegerType, true)
  .add("artist_id", IntegerType, true)
  .add("scrobble_time", StringType, true)

val customSchema = StructType(StructField("user_id", IntegerType, true) :: StructField("item_id", IntegerType, true) :: StructField("artist_id", IntegerType, true) :: StructField("scrobble_time", StringType, true) :: Nil)

I'm not sure why it's not presented as this on the README, but if you check the StructType documentation, it's clear about this.

like image 160
Rohan Aletty Avatar answered Jan 04 '23 04:01

Rohan Aletty