I want to delete one field from array.struct as follow:
case class myObj (id: String, item_value: String, delete: String)
case class myObj2 (id: String, item_value: String)
val df2=Seq (
("1", "2","..100values", Seq(myObj ("A", "1a","1"),myObj ("B", "4r","2"))),
("1", "2","..100values", Seq(myObj ("X", "1p","11"),myObj ("V", "7w","8")))
).toDF("1","2","100fields","myArr")
val deleteColumn : (mutable.WrappedArray[myObj]=>mutable.WrappedArray[myObj2])= {
(array: mutable.WrappedArray[myObj]) => array.map(o => myObj2(o.id, o.item_value))
}
val myUDF3 = functions.udf(deleteColumn)
df2.withColumn("newArr",myUDF3($"myArr")).show(false)
Error is very clear:
Exception in thread "main" org.apache.spark.SparkException: Failed to execute user defined function(anonfun$1: (array<struct<id:string,item_value:string,delete:string>>) => array<struct< id:string,item_value:string>>)
It does not match, but is that I want to do, parse from one structure to another ¿?
I am using a UDF because df.map() is not good for mapping specific column and it forces to indicates all columns. So I didn´t find best method to apply this mapping for one column.
You can rewrite your UDF that takes a Row instead of custom object as below
val deleteColumn = udf((value: Seq[Row]) => {
value.map(row => MyObj2(row.getString(0), row.getString(1)))
})
df2.withColumn("newArr", deleteColumn($"myArr"))
Output:
+---+---+-----------+---------------------+----------------+
|1 |2 |100fields |myArr |newArr |
+---+---+-----------+---------------------+----------------+
|1 |2 |..100values|[[A,1a,1], [B,4r,2]] |[[A,1a], [B,4r]]|
|1 |2 |..100values|[[X,1p,11], [V,7w,8]]|[[X,1p], [V,7w]]|
+---+---+-----------+---------------------+----------------+
Not using udf, one can easily remove fields from array of structs using dropFields together with transform.
Test input:
val df = spark.createDataFrame(Seq(("v1", "v2", "v3", "v4"))).toDF("f1", "f2", "f3", "f4")
.select(
array(
struct("f1", "f2"),
struct(col("f3").as("f1"), col("f4").as("f2")),
).as("myArr")
)
df.printSchema()
// root
// |-- myArr: array (nullable = false)
// | |-- element: struct (containsNull = false)
// | | |-- f1: string (nullable = true)
// | | |-- f2: string (nullable = true)
Script:
val df2 = df.withColumn(
"myArr",
transform(
$"myArr",
x => x.dropFields("f2")
)
)
df2.printSchema()
// root
// |-- myArr: array (nullable = false)
// | |-- element: struct (containsNull = false)
// | | |-- f1: string (nullable = true)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With