Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Select with a List of Columns Scala

I am trying to find a good way of doing a spark select with a List[Column, I am exploding a column than passing back all the columns I am interested in with my exploded column.

var columns = getColumns(x) // Returns a List[Column]
tempDf.select(columns)   //trying to get

Trying to find a good way of doing this I know, if it were a string I could do something like

val result = dataframe.select(columnNames.head, columnNames.tail: _*)
like image 649
neuroh Avatar asked Oct 07 '16 05:10

neuroh


People also ask

How do I select multiple columns in Spark?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.

How do I get a list of columns in Spark DataFrame?

You can get the all columns of a Spark DataFrame by using df. columns , it returns an array of column names as Array[Stirng] .


1 Answers

For spark 2.0 seems that you have two options. Both depends on how you manage your columns (Strings or Columns).

Spark code (spark-sql_2.11/org/apache/spark/sql/Dataset.scala):

def select(cols: Column*): DataFrame = withPlan {
  Project(cols.map(_.named), logicalPlan)
}

def select(col: String, cols: String*): DataFrame = select((col +: cols).map(Column(_)) : _*)

You can see how internally spark is converting your head & tail to a list of Columns to call again Select.

So, in that case if you want a clear code I will recommend:

If columns: List[String]:

import org.apache.spark.sql.functions.col
df.select(columns.map(col): _*)

Otherwise, if columns: List[Columns]:

df.select(columns: _*)
like image 162
Franzi Avatar answered Oct 09 '22 15:10

Franzi