Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set column names to toDF() function in spark dataframe using a string array?

For example,

val columns=Array("column1", "column2", "column3")
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF(columns)

How can I set column name using string Array? Is it possible to mention data types inside toDF()?

like image 976
Devi Avatar asked Jun 23 '16 13:06

Devi


People also ask

How do I change the column name in a Spark frame?

Spark has a withColumnRenamed() function on DataFrame to change a column name. This is the most straight forward approach; this function takes two parameters; the first is your existing column name and the second is the new column name you wish for. Returns a new DataFrame (Dataset[Row]) with a column renamed.

How do I add column names to a DataFrame in Spark?

You can add multiple columns to Spark DataFrame in several ways if you wanted to add a known set of columns you can easily do by chaining withColumn() or on select(). However, sometimes you may need to add multiple columns after applying some transformations n that case you can use either map() or foldLeft().

How do I select specific columns in Spark DataFrame?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.


1 Answers

toDF() takes a repeated parameter of type String, so you can use the _* type annotation to pass a sequence:

val df=sc.parallelize(Seq(
  (1,"example1", Seq(0,2,5)),
  (2,"example2", Seq(1,20,5)))).toDF(columns: _*)

For more on repeated parameters - see section 4.6.2 in the Scala Language Specification.

like image 160
Tzach Zohar Avatar answered Oct 29 '22 07:10

Tzach Zohar