Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Python function such as max() doesn't work in pyspark application




Python function max(3,6) works under pyspark shell. But if it is put in an application and submit, it will throw an error: TypeError: _() takes exactly 1 argument (2 given)

like image 522
user3610141 Avatar asked Jan 07 '23 06:01


1 Answers

It looks like you have an import conflict in your application most likely due to wildcard import from pyspark.sql.functions:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.1

Using Python version 2.7.10 (default, Oct 19 2015 18:04:42)
SparkContext available as sc, HiveContext available as sqlContext.

In [1]: max(1, 2)
Out[1]: 2

In [2]: from pyspark.sql.functions import max

In [3]: max(1, 2)
TypeError                                 Traceback (most recent call last)
<ipython-input-3-bb133f5d83e9> in <module>()
----> 1 max(1, 2)

TypeError: _() takes exactly 1 argument (2 given)

Unless you work in a relatively limited it is best to either perfix:

from pyspark.sql import functions as sqlf

max(1, 2)
## 2

## Column<max(foo)>

or alias:

from pyspark.sql.functions import max as max_

max(1, 2)
## 2

## Column<max(foo)>
like image 91
zero323 Avatar answered Feb 01 '23 14:02
