Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fetch Spark dataframe column list

Tags:

How to get all the column names in a spark dataframe into a Seq variable .

Input Data & Schema

val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID")

dataset1.printSchema()
root
|-- KEY1: string (nullable = true)
|-- KEY2: string (nullable = true)
|-- ID: string (nullable = true)

I need to store all the column names in variable using scala programming . I have tried as below , but its not working.

val selectColumns = dataset1.schema.fields.toSeq

selectColumns: Seq[org.apache.spark.sql.types.StructField] = WrappedArray(StructField(KEY1,StringType,true),StructField(KEY2,StringType,true),StructField(ID,StringType,true))

Expected output:

val selectColumns = Seq(
  col("KEY1"),
  col("KEY2"),
  col("ID")
)

selectColumns: Seq[org.apache.spark.sql.Column] = List(KEY1, KEY2, ID)
like image 963
RaAm Avatar asked Oct 15 '17 06:10

RaAm


People also ask

How do I get a list of columns in Spark DataFrame?

You can get the all columns of a Spark DataFrame by using df. columns , it returns an array of column names as Array[Stirng] .

How do I get column values from Spark DataFrame?

In order to convert Spark DataFrame Column to List, first select() the column you want, next use the Spark map() transformation to convert the Row to String, finally collect() the data to the driver which returns an Array[String] .

How do you select a list of columns in PySpark?

In PySpark, select() function is used to select single, multiple, column by index, all columns from the list and the nested columns from a DataFrame, PySpark select() is a transformation function hence it returns a new DataFrame with the selected columns.


Video Answer


2 Answers

You can use the following command:

val selectColumns = dataset1.columns.toSeq

scala> val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID")
dataset1: org.apache.spark.sql.DataFrame = [KEY1: string, KEY2: string ... 1 more field]

scala> val selectColumns = dataset1.columns.toSeq
selectColumns: Seq[String] = WrappedArray(KEY1, KEY2, ID)
like image 187
Yaron Avatar answered Sep 20 '22 12:09

Yaron


val selectColumns = dataset1.columns.toList.map(col(_))
like image 25
RaAm Avatar answered Sep 21 '22 12:09

RaAm