Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark error - Decimal precision 39 exceeds max precision 38

When I try to collect data from Spark dataframe, I get an error stating

"java.lang.IllegalArgumentException: requirement failed: Decimal precision 39 exceeds max precision 38".

All the data which is in Spark dataframe is from Oracle database, where I believe decimal precision is <38. Is there any way I can achieve this without modifying the data?

# Load required table into memory from Oracle database
df <- loadDF(sqlContext, source = "jdbc", url = "jdbc:oracle:thin:usr/[email protected]:1521" , dbtable = "TBL_NM")

RawData <- df %>% 
    filter(DT_Column > DATE(‘2015-01-01’))

RawData <- as.data.frame(RawData)

Gives error

Below is the stacktrace:

WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 10...***, executor 0): java.lang.IllegalArgumentException: requirement failed: Decimal precision 39 exceeds max precision 38 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:113) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:426) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$9.apply(JdbcUtils.scala:337) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$9.apply(JdbcUtils.scala:337) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$nullSafeConvert(JdbcUtils.scala:438) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3.apply(JdbcUtils.scala:337) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3.apply(JdbcUtils.scala:335) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Please suggest any solution. Thank you.

like image 442
Koushik Avatar asked May 23 '17 09:05

Koushik


1 Answers

Ran across this with AWS Glue and Postgres. There was a bug in Spark 2.1.0 that fixed it for most people, but someone posted a workaround in the comments about using a customSchema option.

I had a similar issue with AWS Glue and Spark SQL: I was calculating a currency amount so the result was a float. Glue threw the error Decimal precision 1 exceeds max precision -1 even though the Glue Data Catalog defined the column as a decimal. Took a page from the above customSchema solution by explicitly casting the column as NUMERIC(10,2) and Spark stopped complaining.

like image 160
cchen42 Avatar answered Sep 28 '22 00:09

cchen42