I am relatively new to spark and I've run into an issue when I try to use python's builtin round() function after importing pyspark functions. It seems to have to do with how I import the pyspark functions but I am not sure what the difference is or why one way would cause issues and the other wouldn't.
Expected behavior:
import pyspark.sql.functions
print(round(3.14159265359,2))
>>> 3.14
Unexpected behavior:
from pyspark.sql.functions import *
print(round(3.14159265359,2))
>>> ERROR
AttributeError Traceback (most recent call last)
<ipython-input-1-50155ca4fa82> in <module>()
1 from pyspark.sql.functions import *
----> 2 print(round(3.1454848383,2))
/opt/spark/python/pyspark/sql/functions.py in round(col, scale)
503 """
504 sc = SparkContext._active_spark_context
--> 505 return Column(sc._jvm.functions.round(_to_java_column(col), scale))
506
507
AttributeError: 'NoneType' object has no attribute '_jvm'
Import import pyspark.sql.functions as F
to avoid conflict.
In this way, you can use all python built-in functions normally and when you want to use pyspark functions, use them as F.round
Don't do import * as it can mess up your namespace.
Pyspark has round function: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.round
So build-in function round
is being replaced by pyspark.sql.functions.round
If you have a long piece of code where you have used pyspark.sql.functions without any reference like F. Then inorder to use python round exclusively, you can use __builtins__.round()
in pyspark code. @michael_west was almost right but the module should be __builtins__
instead of __builtin__.
Example code:
from builtins import round
k = round(123.456)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With