Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is it possible to add new column to existing Dataframe in Spark SQL

I use DataFrame API.

I have existing DataFrame and a List object (can also use Array). How is it possible to add this List to existing DataFrame as a new column? Should I use the class Column for this?

like image 357
Guforu Avatar asked Aug 21 '15 08:08

Guforu


2 Answers

You should probably convert your List to a single Column RDD and apply join on critetia pickeg by you. Simple DataFrame conversion:

 val df1 = sparkContext.makeRDD(yourList).toDF("newColumn")

If you need to create additional column to perform join on you can add more columns, mapping your list:

val df1 = sparkContext.makeRDD(yourList).map(i => (i, fun(i)).toDF("newColumn", "joinOnThisColumn")

I am not familiar with Java version, but you should try using JavaSparkContext.parallelize(yourList) and apply similar mapping operations based on this doc.

like image 129
TheMP Avatar answered Oct 20 '22 04:10

TheMP


Sorry, It was my fault, I already found the function withColumn(String colName, Column col) which should solve my problem

like image 40
Guforu Avatar answered Oct 20 '22 03:10

Guforu