Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark: how to get the maximum absolute value of a column in a data frame?

Suppose I have

+----+---+
|  v1| v2|
+----+---+
|-1.0|  0|
| 0.0|  1|
| 1.0|  2|
|-2.0|  3|
+----+---+

I want get the max absolute value of column v1, which is 2.0. Thanks!

like image 765
kww Avatar asked Jan 17 '18 23:01

kww


People also ask

How do you find the maximum value of a column in PySpark DataFrame?

We can get the maximum value from the column in the dataframe using the agg() method. This method is known as aggregation, which groups the values within a column. It will take dictionary as a parameter in that key will be column name and value is the aggregate function, i.e., max.

How do you get max and min in PySpark?

The PySpark SQL Aggregate functions are further grouped as the “agg_funcs” in the Pyspark. The Kurtosis() function returns the kurtosis of the values present in the group. The min() function returns the minimum value currently in the column. The max() function returns the maximum value present in the queue.

What is withColumn PySpark?

PySpark withColumn() is a transformation function of DataFrame which is used to change the value, convert the datatype of an existing column, create a new column, and many more.


1 Answers

Use agg with max and abs from pyspark.sql.functions:

import pyspark.sql.functions as F
df.agg(F.max(F.abs(df.v1))).first()[0]
# 2
like image 70
Psidom Avatar answered Sep 20 '22 16:09

Psidom