Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark truncate a decimal

I'm working in pySpark and I have a variable LATITUDE that has a lot of decimal places. I need to create two new variables from this, one that is rounded and one that is truncated. Both to three decimal places.

What is the simplest way to truncate a value?

For rounding, I did:

raw_data = raw_data.withColumn("LATITUDE_ROUND", round(raw_data.LATITUDE, 3))

This seems to work, but let me know if there is a better way.

like image 783
Amber Z. Avatar asked Aug 03 '16 18:08

Amber Z.


People also ask

How do you set decimal places in PySpark?

Format Number You can use format_number to format a number to desired decimal places as stated in the official api document: Formats numeric column x to a format like '#,###,###. ##', rounded to d decimal places, and returns the result as a string column.

How do you round to 2 decimal places in Python?

Python's round() function requires two arguments. First is the number to be rounded. Second argument decides the number of decimal places to which it is rounded. To round the number to 2 decimals, give second argument as 2.

What is withColumn PySpark?

PySpark withColumn() is a transformation function of DataFrame which is used to change the value, convert the datatype of an existing column, create a new column, and many more.


2 Answers

Try:

>>> from pyspark.sql.functions import pow, lit
>>> from pyspark.sql.types import LongType
>>>
>>> num_places = 3
>>> m = pow(lit(10), num_places).cast(LongType())
>>> df = sc.parallelize([(0.6643, ), (0.6446, )]).toDF(["x"])
>>> df.withColumn("trunc", (col("x") * m).cast(LongType()) / m).
like image 80
user6022341 Avatar answered Sep 17 '22 15:09

user6022341


You could use the floor() function. So (without testing) I'd suggest:

raw_data = raw_data.withColumn("LATITUDE_TRUNCATED", floor(raw_data.LATITUDE))

But watch out for negative values - as in https://math.stackexchange.com/questions/344815/how-do-the-floor-and-ceiling-functions-work-on-negative-numbers

like image 40
Grzegorz Oledzki Avatar answered Sep 17 '22 15:09

Grzegorz Oledzki