Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert pyspark groupedData object to spark Dataframe

Tags:

pyspark-sql

I have to do a 2 levels grouping on a pyspark dataframe. My tentative:

grouped_df=df.groupby(["A","B","C"])
grouped_df.groupby(["C"]).count()

But I get the following error:

'GroupedData' object has no attribute 'groupby'

I guess I should first convert the grouped object into a pySpark DF. But I cannot do that.

Any suggestion?

like image 819
Mauro Gentile Avatar asked Oct 18 '17 12:10

Mauro Gentile


1 Answers

I had the same issue. The way I got around it was by first doing a "count()" after the first groupby, because that returns a Spark DataFrame, rather than the GroupedData object. Then you can do another groupby on that returned DataFrame.

So try:

grouped_df=df.groupby(["A","B","C"]).count()
grouped_df.groupby(["C"]).count()
like image 85
M. Rubins Avatar answered Sep 27 '22 22:09

M. Rubins