Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get nth row of Spark RDD?

Suppose I have an RDD of arbitrary objects. I wish to get the 10th (say) row of the RDD. How would I do that? One way is to use rdd.take(n) and then access the nth element is the object, but this approach is slow when n is large.

like image 936
user1742188 Avatar asked Jan 07 '15 18:01

user1742188


People also ask

How do I print a row in RDD?

To print RDD contents, we can use RDD collect action or RDD foreach action. RDD. collect() returns all the elements of the dataset as an array at the driver program, and using for loop on this array, we can print elements of RDD. RDD foreach(f) runs a function f on each element of the dataset.

How do I select specific rows in Spark DataFrame?

Method 1: Using filter() This function is used to filter the dataframe by selecting the records based on the given condition. Example: Python code to select the dataframe based on subject2 column.

What does RDD collect () return?

collect. Return a list that contains all of the elements in this RDD. This method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

Can RDD have schema?

RDD is a distributed collection of data elements without any schema. It is an extension of Dataframes with more features like type-safety and object-oriented interface.


1 Answers

RDD.collect() and RDD.take(x) both return a list, which supports indexing. So each time we need an element at position N.We can perform any of following two codes: RDD.collect()[N-1] or RDD.take(N)[N-1] will work fine when we want element at position N.

like image 71
Neeraj Mehta Avatar answered Sep 28 '22 18:09

Neeraj Mehta