I use DataFrame API.
I have existing DataFrame and a List object (can also use Array). How is it possible to add this List to existing DataFrame as a new column? Should I use the class Column for this?
You should probably convert your List to a single Column RDD and apply join on critetia pickeg by you. Simple DataFrame conversion:
val df1 = sparkContext.makeRDD(yourList).toDF("newColumn")
If you need to create additional column to perform join on you can add more columns, mapping your list:
val df1 = sparkContext.makeRDD(yourList).map(i => (i, fun(i)).toDF("newColumn", "joinOnThisColumn")
I am not familiar with Java version, but you should try using JavaSparkContext.parallelize(yourList)
and apply similar mapping operations based on this doc.
Sorry, It was my fault, I already found the function withColumn(String colName, Column col)
which should solve my problem
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With