Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Apache Spark: get elements of Row by name

Tags:

dataframe

schema

scala

apache-spark

In a DataFrame object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row objects, is there any way to extract values by name? I can see how to do some really awkward stuff:

def foo(r: Row) = {
  val ix = (0 until r.schema.length).map( i => r.schema(i).name -> i).toMap
  val field1 = r.getString(ix("field1"))
  val field2 = r.getLong(ix("field2"))
  ...
}
dataframe.map(foo)

I figure there must be a better way - this is pretty verbose, it requires creating this extra structure, and it also requires knowing the types explicitly, which if incorrect, will produce a runtime exception rather than a compile-time error.

like image

453

asked Jun 05 '15 19:06

Ken Williams

People also ask

What is org Apache Spark SQL row?

A row in Spark is an ordered collection of fields that can be accessed starting at index 0. The row is a generic object of type Row . Columns making up the row can be of the same or different types.

2 Answers

You can use "getAs" from org.apache.spark.sql.Row

r.getAs("field1")
r.getAs("field2")

Know more about getAs(java.lang.String fieldName)

like image

72

answered Oct 09 '22 12:10

Kexin Nie

This is not supported at this time in the Scala API. The closest you have is this JIRA titled "Support converting DataFrames to typed RDDs"

like image

44

answered Oct 09 '22 13:10

Justin Pihony

Sign in to Comment

Related questions
                            
                                running a maven scala project
                            
                                Scala: Mix traits and case class in pattern match
                            
                                String formatting in scala - maximum decimal precision
                            
                                Scala perf: Why is this Scala app 30x slower than the equivalent Java app?
                            
                                Remove duplicates in List specifying equality function
                            
                                Scala enumeration type fail in match/case
                            
                                Check if a key exists in play.api.libs.json.Json
                            
                                Coalescing options in Scala
                            
                                Getting the maximum key value pair in a Scala map by value
                            
                                Why are Scala's `Lists` implemented as linked lists
                            
                                How do I integrate ScalaTest with Spring
                            
                                When are scala's for-comprehensions lazy?
                            
                                Is it possible to create per-instance mixins in C++11?
                            
                                Scala Option: map vs Pattern Matching
                            
                                How to avoid scala's case class default toString function being overridden?
                            
                                What is the server side comment tag in scala templates in play framework?
                            
                                How to get min by value only in Scala Map
                            
                                How does Scala maintains the values of variable when the closure was defined?
                            
                                Defining implicit view-bounds on Scala traits
                            
                                How to init List of tuple and add item in scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With