I'm trying to perform a simple map on a Dataset[Row] (DataFrame) in Spark 2.0.0. Something as simple as this
val df: DataSet[Row] = ...
df.map { r: Row => r }
But the compiler is complaining that I'm not providing the implicit Encoder[Row] argument to the map function:
not enough arguments for method map: (implicit evidence$7: Encoder[Row]).
Everything works fine if I convert to an RDD first ds.rdd.map { r: Row => r } but shouldn't there be an easy way to get an Encoder[Row] like there is for tuple types Encoders.product[(Int, Double)]?
[Note that my Row is dynamically sized in such a way that it can't easily be converted into a strongly-typed Dataset.]
SSry to be a "bit" late. Hopefully this helps to someone who is hitting the problem right now. Easiest way to define encoder is deriving the structure from existing DataFrame:
val df = Seq((1, "a"), (2, "b"), (3, "c").toDF("id", "name")
val myEncoder = RowEndocer(df.schema)
Such approach could be useful when you need altering existing fields from your original DataFrame.
If you're dealing with completely new structure, explicit definition relying on StructType and StructField (as suggested in @Reactormonk 's little cryptic response).
Example defining the same encoder:
val myEncoder2 = RowEncoder(StructType(
Seq(StructField("id", IntegerType),
StructField("name", StringType)
)))
Please remember org.apache.spark.sql._, org.apache.spark.sql.types._ and org.apache.spark.sql.catalyst.encoders.RowEncoder libraries have to be imported.
In your specific case where the mapped function does not change the schema, you can pass in the encoder of the DataFrame itself:
df.map(r => r)(df.encoder)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With