Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to pass parameter to dictionary input for agg pyspark function

Tags:

python

pyspark

From the pyspark docs, I Can do:

gdf = df.groupBy(df.name)
sorted(gdf.agg({"*": "first"}).collect())

In my actual use case I have maaaany variables, so I like that I can simply create a dictionary, which is why:

gdf = df.groupBy(df.name)
sorted(gdf.agg(F.first(col, ignorenulls=True)).collect())

@lemon's suggestion won't work for me.

How can I pass a parameter for first (i.e. ignorenulls=True), see here.

like image 563
safex Avatar asked Oct 29 '25 06:10

safex


1 Answers

You can use list comprehension.

gdf.agg(*[F.first(x, ignorenulls=True).alias(x) for x in df.columns]).collect()
like image 61
Emma Avatar answered Oct 31 '25 12:10

Emma



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!