I am having the following python/pandas command: <pre class="prettyprint"><code>df.groupby('Column_Name').agg(lambda x: x.value_counts().max() </code></pre> where I am getting the value counts for ALL columns in a <code>DataFrameGroupBy</code> object. How do I do this action in PySpark?

It's more or less the same: <pre class="prettyprint"><code>spark_df.groupBy('column_name').count().orderBy('count') </code></pre> In the groupBy you can have multiple columns delimited by a <code>,</code> For example <code>groupBy('column_1', 'column_2')</code>

try this when you want to control the order: <pre class="prettyprint"><code>data.groupBy('col_name').count().orderBy('count', ascending=False).show() </code></pre>

What's the equivalent of Panda's value_counts() in PySpark?

Tags:

dataframe

count

pandas-groupby

pyspark

I am having the following python/pandas command:

df.groupby('Column_Name').agg(lambda x: x.value_counts().max()

where I am getting the value counts for ALL columns in a DataFrameGroupBy object.

How do I do this action in PySpark?

709

asked Jun 27 '18 13:06

TSAR

2 Answers

It's more or less the same:

spark_df.groupBy('column_name').count().orderBy('count')

In the groupBy you can have multiple columns delimited by a ,

For example groupBy('column_1', 'column_2')

answered Sep 27 '22 22:09

Tanjin

try this when you want to control the order:

data.groupBy('col_name').count().orderBy('count', ascending=False).show()

answered Sep 28 '22 00:09

der Fotik

Related questions
                            
                                Is it possible to modify a data.frame in-place (destructively)?
                            
                                dplyr mutate in R - add column as concat of columns
                            
                                Python: Pandas dataframe from Series of dict
                            
                                Python: ufunc 'add' did not contain a loop with signature matching types dtype('S21') dtype('S21') dtype('S21')
                            
                                Creating a new column to a data frame using a formula from another variable
                            
                                pandas reset_index after groupby.value_counts()
                            
                                Convert summary to data.frame
                            
                                Pandas groupby for zero values
                            
                                Delete rows if there are null values in a specific column in Pandas dataframe [duplicate]
                            
                                Pandas DataFrame Replace NaT with None
                            
                                Got continuous is not supported error in RandomForestRegressor
                            
                                How to delete rows from a data.frame, based on an external list, using R?
                            
                                pandas python how to count the number of records or rows in a dataframe
                            
                                How to set all the values of an existing Pandas DataFrame to zero?
                            
                                get min and max from a specific column scala spark dataframe
                            
                                Take randomly sample based on groups
                            
                                Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series
                            
                                Using lapply to apply a function over list of data frames and saving output to files with different names
                            
                                Accessing a Pandas index like a regular column
                            
                                reading excel sheet as multiindex dataframe through pd.read_excel()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With