Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I add a column with a value to a new Dataset in Spark Java?

So, I'm creating some Datasets from the java Spark API. These datasets are populated from hive table, using the spark.sql() method.

So, after performing some sql operations (like joins), I have a final dataset. What I want to do is that I want to add a new column to that final dataset, with a value of "1" to all the rows in the dataset. So, you could probably see it as adding a constrain to the Dataset.

So, for example I have this dataset:

Dataset<Row> final = otherDataset.select(otherDataset.col("colA"), otherDataSet.col("colB"));

I want to add a new column to the "final" Dataset, something like this

final.addNewColumn("colName", 1); //I know this doesn't work, but just to give you an idea.

Is there a feasible way to add the new column to all the rows of the Dataset with a value of 1?

like image 605
Juan Carlos Nuño Avatar asked Jul 06 '17 19:07

Juan Carlos Nuño


1 Answers

If you want to add a constant value then you can use lit function

lit(Object literal)
Creates a Column of literal value.

Also, change the variable name final to something else

Dataset<Row> final12 = otherDataset.select(otherDataset.col("colA"), otherDataSet.col("colB"));


Dataset<Row> result = final12.withColumn("columnName", lit(1)) 

Hope this helps!

like image 157
koiralo Avatar answered Sep 28 '22 02:09

koiralo