Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the compatible datatype for bigint in Spark and how can we cast bigint into a spark compatible datatype?

I am trying to move data from greenplum to HDFS using Spark. I can read the data successfully from the source table and the spark inferred schema of the dataframe (of the greenplum table) is:

DataFrame Schema:

 je_header_id: long (nullable = true)
 je_line_num: long (nullable = true)
 last_updated_by: decimal(15,0) (nullable = true)
 last_updated_by_name: string (nullable = true)
 ledger_id: long (nullable = true)
 code_combination_id: long (nullable = true)
 balancing_segment: string (nullable = true)
 cost_center_segment: string (nullable = true)
 period_name: string (nullable = true)
 effective_date: timestamp (nullable = true)
 status: string (nullable = true)
 creation_date: timestamp (nullable = true)
 created_by: decimal(15,0) (nullable = true)
 entered_dr: decimal(38,20) (nullable = true)
 entered_cr: decimal(38,20) (nullable = true)
 entered_amount: decimal(38,20) (nullable = true)
 accounted_dr: decimal(38,20) (nullable = true)
 accounted_cr: decimal(38,20) (nullable = true)
 accounted_amount: decimal(38,20) (nullable = true)
 xx_last_update_log_id: integer (nullable = true)
 source_system_name: string (nullable = true)
 period_year: decimal(15,0) (nullable = true)
 period_num: decimal(15,0) (nullable = true)

The corresponding schema of the Hive table is:

je_header_id:bigint|je_line_num:bigint|last_updated_by:bigint|last_updated_by_name:string|ledger_id:bigint|code_combination_id:bigint|balancing_segment:string|cost_center_segment:string|period_name:string|effective_date:timestamp|status:string|creation_date:timestamp|created_by:bigint|entered_dr:double|entered_cr:double|entered_amount:double|accounted_dr:double|accounted_cr:double|accounted_amount:double|xx_last_update_log_id:int|source_system_name:string|period_year:bigint|period_num:bigint

Using the Hive table schema mentioned above, I created the below StructType from using the logic:

def convertDatatype(datatype: String): DataType = {
  val convert = datatype match {
    case "string"     => StringType
    case "bigint"     => LongType
    case "int"        => IntegerType
    case "double"     => DoubleType
    case "date"       => TimestampType
    case "boolean"    => BooleanType
    case "timestamp"  => TimestampType
  }
  convert
}

Prepared Schema:

 je_header_id: long (nullable = true)
 je_line_num: long (nullable = true)
 last_updated_by: long (nullable = true)
 last_updated_by_name: string (nullable = true)
 ledger_id: long (nullable = true)
 code_combination_id: long (nullable = true)
 balancing_segment: string (nullable = true)
 cost_center_segment: string (nullable = true)
 period_name: string (nullable = true)
 effective_date: timestamp (nullable = true)
 status: string (nullable = true)
 creation_date: timestamp (nullable = true)
 created_by: long (nullable = true)
 entered_dr: double (nullable = true)
 entered_cr: double (nullable = true)
 entered_amount: double (nullable = true)
 accounted_dr: double (nullable = true)
 accounted_cr: double (nullable = true)
 accounted_amount: double (nullable = true)
 xx_last_update_log_id: integer (nullable = true)
 source_system_name: string (nullable = true)
 period_year: long (nullable = true)
 period_num: long (nullable = true)

When I try to apply my newSchema on the dataframe Schema, I get an exception:

java.lang.RuntimeException: java.math.BigDecimal is not a valid external type for schema of bigint

I understand that it is trying to convert BigDecimal to Bigint and it fails, but could anyone tell me how do I cast the bigint to a spark compatible datatype ? If not, how can I modify my logic to give proper datatypes in the case statement for this bigint/bigdecimal problem ?

like image 859
Torque Avatar asked Nov 06 '25 22:11

Torque


1 Answers

Here by seeing your question, seems like you are trying to convert bigint value to big decimal, which is not right. Bigdecimal is a decimal that must have fixed precision (the maximum number of digits) and scale (the number of digits on right side of dot). And your's is seems like long value.

Here instead of using BigDecimal datatype, try to use LongType to convert bigint value correctly. See if this solve your purpose.

like image 170
Ajay Kharade Avatar answered Nov 08 '25 14:11

Ajay Kharade



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!