I have to do a 2 levels grouping on a pyspark dataframe. My tentative:
grouped_df=df.groupby(["A","B","C"])
grouped_df.groupby(["C"]).count()
But I get the following error:
'GroupedData' object has no attribute 'groupby'
I guess I should first convert the grouped object into a pySpark DF. But I cannot do that.
Any suggestion?
I had the same issue. The way I got around it was by first doing a "count()" after the first groupby, because that returns a Spark DataFrame, rather than the GroupedData object. Then you can do another groupby on that returned DataFrame.
So try:
grouped_df=df.groupby(["A","B","C"]).count()
grouped_df.groupby(["C"]).count()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With