convert pyspark groupedData object to spark Dataframe

Question

I have to do a 2 levels grouping on a pyspark dataframe. My tentative:

grouped_df=df.groupby(["A","B","C"])
grouped_df.groupby(["C"]).count()

But I get the following error:

'GroupedData' object has no attribute 'groupby'

I guess I should first convert the grouped object into a pySpark DF. But I cannot do that.

Any suggestion?

M. Rubins · Accepted Answer

I had the same issue. The way I got around it was by first doing a "count()" after the first groupby, because that returns a Spark DataFrame, rather than the GroupedData object. Then you can do another groupby on that returned DataFrame.

So try:

grouped_df=df.groupby(["A","B","C"]).count()
grouped_df.groupby(["C"]).count()

So try:

grouped_df=df.groupby(["A","B","C"]).count()
grouped_df.groupby(["C"]).count()

convert pyspark groupedData object to spark Dataframe

Tags:

pyspark-sql

Mauro Gentile

1 Answers

M. Rubins

Recent Activity

Donate For Us

convert pyspark groupedData object to spark Dataframe

Tags:

pyspark-sql

Mauro Gentile

1 Answers

M. Rubins

Related questions

Recent Activity

Donate For Us