I have a code something like this and I want to work on JavaRDD instead of RDD. So, I'm doing conversion here. I would like to know the performance impact of this transformation specially when I'm dealing with GBs of data.
RDD<String> textFile = sc.textFile(filePath, 2);
JavaRDD<String> javaRDD = textFile.toJavaRDD();
Is this wide transformation or narrow ? What is the difference between JavaRDD and RDD ?
There is no significant overhead when converting one Dataframe to RDD with df. rdd since the dataframes they already keep an instance of their RDDs initialized therefore returning a reference to this RDD should not have any additional cost.
(If you're new to Spark, JavaRDD is a distributed collection of objects, in this case lines of text in a file. We can apply operations to these objects that will automatically be parallelized across a cluster.)
This operation is also called groupWith. When called on datasets of types T and U, returns a dataset of (T, U) pairs (all pairs of elements). Pipe each partition of the RDD through a shell command, e.g. a Perl or bash script. It decreases the number of partitions in the RDD to numPartitions.
Thus, Actions are Spark RDD operations that give non-RDD values. The values of action are stored to drivers or to the external storage system. It brings laziness of RDD into motion. An action is one of the ways of sending data from Executer to the driver. Executors are agents that are responsible for executing a task.
There's no significant performance penalty - JavaRDD
is a simple wrapper around RDD
just to make calls from Java code more convenient. It holds the original RDD
as its member, and calls that member's method on any method invocation, for example (from JavaRDD.scala):
def cache(): JavaRDD[T] = wrapRDD(rdd.cache())
wrapRDD
boils down to something like new JavaRDD[T](rdd)
, so the only performance penalty is creating a thin Java object for every method invocation, but that's entirely negligible as it's not done per element in the RDD, but once for the entire object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With