Suppose I have an RDD of arbitrary objects. I wish to get the 10th (say) row of the RDD. How would I do that? One way is to use rdd.take(n) and then access the nth element is the object, but this approach is slow when n is large.
To print RDD contents, we can use RDD collect action or RDD foreach action. RDD. collect() returns all the elements of the dataset as an array at the driver program, and using for loop on this array, we can print elements of RDD. RDD foreach(f) runs a function f on each element of the dataset.
Method 1: Using filter() This function is used to filter the dataframe by selecting the records based on the given condition. Example: Python code to select the dataframe based on subject2 column.
collect. Return a list that contains all of the elements in this RDD. This method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.
RDD is a distributed collection of data elements without any schema. It is an extension of Dataframes with more features like type-safety and object-oriented interface.
RDD.collect()
and RDD.take(x)
both return a list, which supports indexing. So each time we need an element at position N.We can perform any of following two codes:
RDD.collect()[N-1]
or
RDD.take(N)[N-1]
will work fine when we want element at position N.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With