Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find maximum value of a column in python dataframe

I have a data frame in pyspark. In this data frame I have column called id that is unique.

Now I want to find the maximum value of the column id in the data frame.

I have tried like below

df['id'].max()

But got below error

TypeError: 'Column' object is not callable

Please let me know how to find the maximum value of a column in data frame

In the answer by @Dadep the link gives the correct answer

like image 453
User12345 Avatar asked May 11 '17 20:05

User12345


2 Answers

if you are using pandas .max() will work :

>>> df2=pd.DataFrame({'A':[1,5,0], 'B':[3, 5, 6]})
>>> df2['A'].max()
5

Else if it's a spark dataframe:

Best way to get the max value in a Spark dataframe column

like image 188
Dadep Avatar answered Oct 08 '22 11:10

Dadep


I'm coming from scala, but I do believe that this is also applicable on python.

val max = df.select(max("id")).first()

but you have first import the following :

from pyspark.sql.functions import max
like image 29
Haroun Mohammedi Avatar answered Oct 08 '22 10:10

Haroun Mohammedi