How to count frequency of each categorical variable in a column in pyspark dataframe?

Question

Say I have a pyspark dataframe:

df.show()
+-----+---+
|  x  |  y|
+-----+---+
|alpha|  1|
|beta |  2|
|gamma|  1|
|alpha|  2|
+-----+---+

I want to count how many occurrence alpha, beta and gamma there are in column x. How do I do this in pyspark?

versatile parsley · Accepted Answer

Use pyspark.sql.DataFrame.cube():

df.cube("x").count().show()

Donate For Us