What is the compatible datatype for bigint in Spark and how can we cast bigint into a spark compatible datatype?

Question

I am trying to move data from greenplum to HDFS using Spark. I can read the data successfully from the source table and the spark inferred schema of the dataframe (of the greenplum table) is:

DataFrame Schema:

 je_header_id: long (nullable = true)
 je_line_num: long (nullable = true)
 last_updated_by: decimal(15,0) (nullable = true)
 last_updated_by_name: string (nullable = true)
 ledger_id: long (nullable = true)
 code_combination_id: long (nullable = true)
 balancing_segment: string (nullable = true)
 cost_center_segment: string (nullable = true)
 period_name: string (nullable = true)
 effective_date: timestamp (nullable = true)
 status: string (nullable = true)
 creation_date: timestamp (nullable = true)
 created_by: decimal(15,0) (nullable = true)
 entered_dr: decimal(38,20) (nullable = true)
 entered_cr: decimal(38,20) (nullable = true)
 entered_amount: decimal(38,20) (nullable = true)
 accounted_dr: decimal(38,20) (nullable = true)
 accounted_cr: decimal(38,20) (nullable = true)
 accounted_amount: decimal(38,20) (nullable = true)
 xx_last_update_log_id: integer (nullable = true)
 source_system_name: string (nullable = true)
 period_year: decimal(15,0) (nullable = true)
 period_num: decimal(15,0) (nullable = true)

The corresponding schema of the Hive table is:

je_header_id:bigint|je_line_num:bigint|last_updated_by:bigint|last_updated_by_name:string|ledger_id:bigint|code_combination_id:bigint|balancing_segment:string|cost_center_segment:string|period_name:string|effective_date:timestamp|status:string|creation_date:timestamp|created_by:bigint|entered_dr:double|entered_cr:double|entered_amount:double|accounted_dr:double|accounted_cr:double|accounted_amount:double|xx_last_update_log_id:int|source_system_name:string|period_year:bigint|period_num:bigint

Using the Hive table schema mentioned above, I created the below StructType from using the logic:

def convertDatatype(datatype: String): DataType = {
  val convert = datatype match {
    case "string"     => StringType
    case "bigint"     => LongType
    case "int"        => IntegerType
    case "double"     => DoubleType
    case "date"       => TimestampType
    case "boolean"    => BooleanType
    case "timestamp"  => TimestampType
  }
  convert
}

Prepared Schema:

 je_header_id: long (nullable = true)
 je_line_num: long (nullable = true)
 last_updated_by: long (nullable = true)
 last_updated_by_name: string (nullable = true)
 ledger_id: long (nullable = true)
 code_combination_id: long (nullable = true)
 balancing_segment: string (nullable = true)
 cost_center_segment: string (nullable = true)
 period_name: string (nullable = true)
 effective_date: timestamp (nullable = true)
 status: string (nullable = true)
 creation_date: timestamp (nullable = true)
 created_by: long (nullable = true)
 entered_dr: double (nullable = true)
 entered_cr: double (nullable = true)
 entered_amount: double (nullable = true)
 accounted_dr: double (nullable = true)
 accounted_cr: double (nullable = true)
 accounted_amount: double (nullable = true)
 xx_last_update_log_id: integer (nullable = true)
 source_system_name: string (nullable = true)
 period_year: long (nullable = true)
 period_num: long (nullable = true)

When I try to apply my newSchema on the dataframe Schema, I get an exception:

java.lang.RuntimeException: java.math.BigDecimal is not a valid external type for schema of bigint

I understand that it is trying to convert BigDecimal to Bigint and it fails, but could anyone tell me how do I cast the bigint to a spark compatible datatype ? If not, how can I modify my logic to give proper datatypes in the case statement for this bigint/bigdecimal problem ?

Ajay Kharade · Accepted Answer

Here by seeing your question, seems like you are trying to convert bigint value to big decimal, which is not right. Bigdecimal is a decimal that must have fixed precision (the maximum number of digits) and scale (the number of digits on right side of dot). And your's is seems like long value.

Here instead of using BigDecimal datatype, try to use LongType to convert bigint value correctly. See if this solve your purpose.

What is the compatible datatype for bigint in Spark and how can we cast bigint into a spark compatible datatype?

Tags:

apache-spark

apache-spark-sql

hadoop

hive

Torque

1 Answers

Ajay Kharade

Recent Activity

Donate For Us

What is the compatible datatype for bigint in Spark and how can we cast bigint into a spark compatible datatype?

Tags:

apache-spark

apache-spark-sql

hadoop

hive

Torque

1 Answers

Ajay Kharade

Related questions

Recent Activity

Donate For Us