How do I increase decimal precision in Spark?

Tags:

I have a large DataFrame made up of ~550 columns of doubles and two columns of longs (ids). The 550 columns are being read in from a csv, and I add two id columns. The only other things I do with the data is change some of the csv data from strings to doubles ("Inf" -> "0" then cast the column to double) and replace NaN's with 0:

df = df.withColumn(col.name + "temp", 
                             regexp_replace(
                                 regexp_replace(df(col.name),"Inf","0")
                                 ,"NaN","0").cast(DoubleType))
df = df.drop(col.name).withColumnRenamed(col.name + "temp",col.name)
df = df.withColumn("timeId", monotonically_increasing_id.cast(LongType))
df = df.withColumn("patId", lit(num).cast(LongType))
df = df.na.fill(0)

When I do a count, I get the following error:

IllegalArgumentException: requirement failed: Decimal precision 6 exceeds max precision 5

There are hundreds of thousands of rows, and I'm reading in the data from multiple csvs. How do I increase the decimal precision? Is there something else that could be going on? I am only getting this error when I read in some of the csvs. Could they have more decimals than the others?

512

asked May 31 '17 21:05

Ross Lewis

1 Answers

I think the error is pretty self explanatory- you need to be using a DecimalType not a DoubleType.

Try this:

...
.cast(DecimalType(6)))

Read on:

https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/types/DecimalType.html

http://spark.apache.org/docs/2.0.2/api/python/_modules/pyspark/sql/types.html

datatype for handling big numbers in pyspark

135

answered Oct 05 '22 15:10

rawkintrevo

Related questions
                            
                                How can I cause Python 3.5 to crash?
                            
                                How to use multiprocessing.Pool in an imported module?
                            
                                How can i properly change the assigned secret key in a Django Web Application
                            
                                What's the pythonic way to pass arguments between functions?
                            
                                "Apps aren't loaded yet" and "django.core.exceptions.ImproperlyConfigured" in Django?
                            
                                How to get value (not key) data from SelectField in WTForms [duplicate]
                            
                                Debug python dump file in windbg
                            
                                Tensorflow: How to index a tensor using 2D-index like in numpy
                            
                                How to get weights from tensorflow fully_connected
                            
                                Pandas multiindex dataframe - Selecting max from one index within multiindex
                            
                                Error installing NLTK Python
                            
                                find out all child elements xpath from parent xpath using selenium webdriver in python
                            
                                PyCharm Python Console - Printing on the same line not working as intended
                            
                                Find index where elements change value pandas dataframe
                            
                                attach img file in pdf weasyprint
                            
                                pytorch Network.parameters() missing 1 required positional argument: 'self'
                            
                                How to create a grouped bar chart in Altair?
                            
                                Where is the luigi config file?
                            
                                Setting both axes logarithmic in bar plot matploblib
                            
                                Why does insert script using cx_Oracle hangs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I increase decimal precision in Spark?

Tags:

python

scala

apache-spark

spark-dataframe

bigdata

Ross Lewis

People also ask

1 Answers

rawkintrevo

Recent Activity

Donate For Us