In a DataFrame
object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row
objects, is there any way to extract values by name? I can see how to do some really awkward stuff:
def foo(r: Row) = {
val ix = (0 until r.schema.length).map( i => r.schema(i).name -> i).toMap
val field1 = r.getString(ix("field1"))
val field2 = r.getLong(ix("field2"))
...
}
dataframe.map(foo)
I figure there must be a better way - this is pretty verbose, it requires creating this extra structure, and it also requires knowing the types explicitly, which if incorrect, will produce a runtime exception rather than a compile-time error.
A row in Spark is an ordered collection of fields that can be accessed starting at index 0. The row is a generic object of type Row . Columns making up the row can be of the same or different types.
You can use "getAs
" from org.apache.spark.sql.Row
r.getAs("field1")
r.getAs("field2")
Know more about getAs(java.lang.String fieldName)
This is not supported at this time in the Scala API. The closest you have is this JIRA titled "Support converting DataFrames to typed RDDs"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With