How to access elemens in Row RDD in SCALA

Tags:

apache-spark

My row RDD looks like this:

Array[org.apache.spark.sql.Row] = Array([1,[example1,WrappedArray([**Standford,Organisation,NNP], [is,O,VP], [good,LOCATION,ADP**])]])

I have got this from converting dataframe to rdd, dataframe schema was :

root
 |-- article_id: long (nullable = true)
 |-- sentence: struct (nullable = true)
 |    |-- sentence: string (nullable = true)
 |    |-- attributes: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- tokens: string (nullable = true)
 |    |    |    |-- ner: string (nullable = true)
 |    |    |    |-- pos: string (nullable = true)

Now how do access elements in row rdd, in dataframe I can use df.select("sentence"). I am looking forward to access elements like stanford/other nested elements.

593

asked Aug 18 '16 05:08

1 Answers

As @SarveshKumarSingh wrote in a comment you can access a the rows in a RDD[Row] like you would access any other element in an RDD. Accessing the elements in the row can be done in a couple of ways. Either simply call get like this:

rowRDD.map(row => row.get(2).asInstanceOf[MyType])

or if it is a build in type, you can avoid the type cast:

rowRDD.map(row => row.getList(4))

or you might want to simply use pattern matching, like:

rowRDD.map{case Row(field1: Long, field2: MyType) => field2}

I hope this helps :)

125

answered Nov 15 '22 03:11

Glennie Helles Sindholt

Related questions
                            
                                How to find playframework version of a project?
                            
                                scala variable arguments :_*
                            
                                Importing Scala in Java: weird classes & methods showing
                            
                                Akka actorSelection vs actorOf Difference
                            
                                UPDATE Cassandra table using spark cassandra connector
                            
                                How to define a function as generic across all numbers in scala?
                            
                                Spark DataFrame filtering: retain element belonging to a list
                            
                                Modifying Map via Monocle
                            
                                Checkpointing In ALS Spark Scala
                            
                                Spark remove duplicate rows from DataFrame [duplicate]
                            
                                Play framework: read Json containing null values
                            
                                Why do we need traits in scala?
                            
                                Future that cannot fail in Scala
                            
                                RandomForestClassifier was given input with invalid label column error in Apache Spark
                            
                                Merge several json arrays in circe
                            
                                Scala`s <> operator meaning
                            
                                How to implement LEAD and LAG in Spark-scala
                            
                                Are dead threads replaced in an ExecutionContext and/or Java thread pool?
                            
                                How to change default Serializer for an Akka application?
                            
                                Convert Java array to Scala collection

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to access elemens in Row RDD in SCALA

Tags:

scala

apache-spark

Aayush Rampal

People also ask

1 Answers

Glennie Helles Sindholt

Recent Activity

Donate For Us