Suppose I have
+----+---+
| v1| v2|
+----+---+
|-1.0| 0|
| 0.0| 1|
| 1.0| 2|
|-2.0| 3|
+----+---+
I want get the max absolute value of column v1
, which is 2.0
. Thanks!
We can get the maximum value from the column in the dataframe using the agg() method. This method is known as aggregation, which groups the values within a column. It will take dictionary as a parameter in that key will be column name and value is the aggregate function, i.e., max.
The PySpark SQL Aggregate functions are further grouped as the “agg_funcs” in the Pyspark. The Kurtosis() function returns the kurtosis of the values present in the group. The min() function returns the minimum value currently in the column. The max() function returns the maximum value present in the queue.
PySpark withColumn() is a transformation function of DataFrame which is used to change the value, convert the datatype of an existing column, create a new column, and many more.
Use agg
with max
and abs
from pyspark.sql.functions
:
import pyspark.sql.functions as F
df.agg(F.max(F.abs(df.v1))).first()[0]
# 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With