I would like to access to the min and max of a specific column from my dataframe but I don't have the header of the column, just its number, so I should I do using scala ?
maybe something like this :
val q = nextInt(ncol) //we pick a random value for a column number col = df(q) val minimum = col.min()
Sorry if this sounds like a silly question but I couldn't find any info on SO about this question :/
Method -1 : Using select() method Using the max() method, we can get the maximum value from the column. To use this method, we have to import it from pyspark. sql. functions module, and finally, we can use the collect() method to get the maximum from the column.
You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.
agg is a DataFrame method that accepts those aggregate functions as arguments: scala> my_df.agg(min("column")) res0: org.apache.spark.sql. DataFrame = [min(column): double]
take (num: int) → List[T][source] Take the first num elements of the RDD. It works by first scanning one partition, and use the results from that partition to estimate the number of additional partitions needed to satisfy the limit. Translated from the Scala implementation in RDD#take().
How about getting the column name from the metadata:
val selectedColumnName = df.columns(q) //pull the (q + 1)th column from the columns array df.agg(min(selectedColumnName), max(selectedColumnName))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With