Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the equivalent of Panda's value_counts() in PySpark?

I am having the following python/pandas command:

df.groupby('Column_Name').agg(lambda x: x.value_counts().max() 

where I am getting the value counts for ALL columns in a DataFrameGroupBy object.

How do I do this action in PySpark?

like image 709
TSAR Avatar asked Jun 27 '18 13:06

TSAR


People also ask

Does PySpark have Pandas?

PySpark users can access the full PySpark APIs by calling DataFrame. to_spark() . pandas-on-Spark DataFrame and Spark DataFrame are virtually interchangeable. However, note that a new default index is created when pandas-on-Spark DataFrame is created from Spark DataFrame.

Can I use PySpark instead of Pandas?

In very simple words Pandas run operations on a single machine whereas PySpark runs on multiple machines. If you are working on a Machine Learning application where you are dealing with larger datasets, PySpark is a best fit which could processes operations many times(100x) faster than Pandas.

What does Value_counts () do in Pandas?

Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.

How do you convert PySpark DF to Pandas DF?

Convert PySpark Dataframe to Pandas DataFramePySpark DataFrame provides a method toPandas() to convert it to Python Pandas DataFrame. toPandas() results in the collection of all records in the PySpark DataFrame to the driver program and should be done only on a small subset of the data.


2 Answers

It's more or less the same:

spark_df.groupBy('column_name').count().orderBy('count') 

In the groupBy you can have multiple columns delimited by a ,

For example groupBy('column_1', 'column_2')

like image 76
Tanjin Avatar answered Sep 27 '22 22:09

Tanjin


try this when you want to control the order:

data.groupBy('col_name').count().orderBy('count', ascending=False).show() 
like image 43
der Fotik Avatar answered Sep 28 '22 00:09

der Fotik