I join two PySpark DataFrames as follows:
exprs = [max(x) for x in ["col1","col2"]]
df = df1.union(df2).groupBy(['campk', 'ppk']).agg(*exprs)
But I get this error:
AssertionError: all exprs should be Column
What is wrong?
exprs = [max(x) for x in ["col1","col2"]]
will return character with max ASCII value ie ['o', 'o']
Refering the correct max
would work:
>>> from pyspark.sql import functions as F
>>> exprs = [F.max(x) for x in ["col1","col2"]]
>>> print(exprs)
[Column<max(col1)>, Column<max(col2)>]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With