Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get min and max from a specific column scala spark dataframe

I would like to access to the min and max of a specific column from my dataframe but I don't have the header of the column, just its number, so I should I do using scala ?

maybe something like this :

val q = nextInt(ncol) //we pick a random value for a column number col = df(q) val minimum = col.min() 

Sorry if this sounds like a silly question but I couldn't find any info on SO about this question :/

like image 764
Laure D Avatar asked Apr 05 '17 13:04

Laure D


People also ask

How do I get the maximum value of a column in spark Scala?

Method -1 : Using select() method Using the max() method, we can get the maximum value from the column. To use this method, we have to import it from pyspark. sql. functions module, and finally, we can use the collect() method to get the maximum from the column.

How do I select specific columns in spark DataFrame?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.

What is AGG function in spark?

agg is a DataFrame method that accepts those aggregate functions as arguments: scala> my_df.agg(min("column")) res0: org.apache.spark.sql. DataFrame = [min(column): double]

What does take () do in spark?

take (num: int) → List[T][source] Take the first num elements of the RDD. It works by first scanning one partition, and use the results from that partition to estimate the number of additional partitions needed to satisfy the limit. Translated from the Scala implementation in RDD#take().


1 Answers

How about getting the column name from the metadata:

val selectedColumnName = df.columns(q) //pull the (q + 1)th column from the columns array df.agg(min(selectedColumnName), max(selectedColumnName)) 
like image 137
Justin Pihony Avatar answered Sep 19 '22 11:09

Justin Pihony