I'm working in pySpark and I have a variable LATITUDE that has a lot of decimal places. I need to create two new variables from this, one that is rounded and one that is truncated. Both to three decimal places.
What is the simplest way to truncate a value?
For rounding, I did:
raw_data = raw_data.withColumn("LATITUDE_ROUND", round(raw_data.LATITUDE, 3))
This seems to work, but let me know if there is a better way.
Format Number You can use format_number to format a number to desired decimal places as stated in the official api document: Formats numeric column x to a format like '#,###,###. ##', rounded to d decimal places, and returns the result as a string column.
Python's round() function requires two arguments. First is the number to be rounded. Second argument decides the number of decimal places to which it is rounded. To round the number to 2 decimals, give second argument as 2.
PySpark withColumn() is a transformation function of DataFrame which is used to change the value, convert the datatype of an existing column, create a new column, and many more.
Try:
>>> from pyspark.sql.functions import pow, lit
>>> from pyspark.sql.types import LongType
>>>
>>> num_places = 3
>>> m = pow(lit(10), num_places).cast(LongType())
>>> df = sc.parallelize([(0.6643, ), (0.6446, )]).toDF(["x"])
>>> df.withColumn("trunc", (col("x") * m).cast(LongType()) / m).
You could use the floor()
function. So (without testing) I'd suggest:
raw_data = raw_data.withColumn("LATITUDE_TRUNCATED", floor(raw_data.LATITUDE))
But watch out for negative values - as in https://math.stackexchange.com/questions/344815/how-do-the-floor-and-ceiling-functions-work-on-negative-numbers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With