Update value in struct type column in java spark

Question

I want capability to update value in nested dataset. For this I have a created as nested Dataset in Spark. It has below schema structure:-

root

 |-- field_a: string (nullable = false)

 |-- field_b: struct (nullable = true)

 |    |-- field_d: struct(nullable = false)
          |-- field_not_to_update: string(nullable = true)

 |        |-- field_to_update: string(nullable = false)
 |   field_c: string (nullable = false)

Now I wanted to update value in field_to_update in the dataset. I have tried

aFooData.withColumn("field_b.field_d.field_to_update", lit("updated_val")

Also tried,

aFooData.foreach(new ClassWithForEachFunction());

where ClassWithForEachFunction implements ForEachFunction<Row> and has method public void call(Row aRow) to update field_to_update attribute. Tried same with lamda as well but it was throwing Task not serializable exception so has to go for long process.

None of them are fruitful so far and I am getting same Dataset with foreach and new column with name field_b.field_d.field_to_update in second case. Any other suggestions for same?

Srinivas · Accepted Answer

Please check below code.

Extract the fields from struct
Update the required filed.
Reconstruct the struct back.

scala> df.show(false)
+-------+--------------+
|field_a|field_b       |
+-------+--------------+
|parentA|[srinivas, 20]|
|parentB|[ravi, 30]    |
+-------+--------------+


scala> df.printSchema
root
 |-- field_a: string (nullable = true)
 |-- field_b: struct (nullable = true)
 |    |-- field_to_update: string (nullable = true)
 |    |-- field_not_to_update: integer (nullable = true)


scala> df.select("field_a","field_b.field_to_update","field_b.field_not_to_update").withColumn("field_to_update",lit("updated_val")).select(col("field_a"),struct(col("field_to_update"),col("field_not_to_update")).as("field_b")).show(false)
+-------+-----------------+
|field_a|field_b          |
+-------+-----------------+
|parentA|[updated_val, 20]|
|parentB|[updated_val, 30]|
+-------+-----------------+

Update value in struct type column in java spark

Tags:

java

apache-spark

apache-spark-sql

AlphaBetaGamma

1 Answers

Srinivas

Recent Activity

Donate For Us

Update value in struct type column in java spark

Tags:

java

apache-spark

apache-spark-sql

AlphaBetaGamma

1 Answers

Srinivas

Related questions

Recent Activity

Donate For Us