Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AssertionError: all exprs should be Column

I join two PySpark DataFrames as follows:

exprs = [max(x) for x in ["col1","col2"]]
df = df1.union(df2).groupBy(['campk', 'ppk']).agg(*exprs)

But I get this error:

AssertionError: all exprs should be Column

What is wrong?

like image 376
Dinosaurius Avatar asked Nov 13 '17 13:11

Dinosaurius


1 Answers

exprs = [max(x) for x in ["col1","col2"]]

will return character with max ASCII value ie ['o', 'o']

Refering the correct max would work:

>>> from pyspark.sql import functions as F
>>> exprs = [F.max(x) for x in ["col1","col2"]]
>>> print(exprs)
[Column<max(col1)>, Column<max(col2)>]
like image 82
philantrovert Avatar answered Sep 25 '22 22:09

philantrovert