Python function max(3,6) works under pyspark shell. But if it is put in an application and submit, it will throw an error: TypeError: _() takes exactly 1 argument (2 given)
It looks like you have an import conflict in your application most likely due to wildcard import from pyspark.sql.functions
:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Python version 2.7.10 (default, Oct 19 2015 18:04:42)
SparkContext available as sc, HiveContext available as sqlContext.
In [1]: max(1, 2)
Out[1]: 2
In [2]: from pyspark.sql.functions import max
In [3]: max(1, 2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-bb133f5d83e9> in <module>()
----> 1 max(1, 2)
TypeError: _() takes exactly 1 argument (2 given)
Unless you work in a relatively limited it is best to either perfix:
from pyspark.sql import functions as sqlf
max(1, 2)
## 2
sqlf.max("foo")
## Column<max(foo)>
or alias:
from pyspark.sql.functions import max as max_
max(1, 2)
## 2
max_("foo")
## Column<max(foo)>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With