Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark case class - decimal type encoder error "Cannot up cast from decimal"

I'm extracting data from MySQL/MariaDB and during creation of Dataset, an error occurs with the data types

Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up cast AMOUNT from decimal(30,6) to decimal(38,18) as it may truncate The type path of the target object is: - field (class: "org.apache.spark.sql.types.Decimal", name: "AMOUNT") - root class: "com.misp.spark.Deal" You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;

Case class is defined like this

case class
(
AMOUNT: Decimal
)

Anyone know how to fix it and not touch the database?

like image 767
mispp Avatar asked Dec 03 '16 20:12

mispp


1 Answers

Building on @user2737635's answer, you can use a foldLeft rather than foreach to avoid defining your dataset as a var and redefining it:

//first read data to dataframe with any way suitable for you
val df: DataFrame = ???
val dfSchema = df.schema

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.DecimalType
dfSchema.foldLeft(df){ 
  (dataframe, field) =>  field.dataType match {
    case t: DecimalType if t != DecimalType(38, 18) => dataframe.withColumn(field.name, col(field.name).cast(DecimalType(38, 18)))
    case _ => dataframe
  }
}.as[YourCaseClassWithBigDecimal]
like image 93
AdamAbrahams Avatar answered Sep 27 '22 21:09

AdamAbrahams