I have a code something like this and I want to work on JavaRDD instead of RDD. So, I'm doing conversion here. I would like to know the performance impact of this transformation specially when I'm dealing with GBs of data. <pre class="prettyprint"><code>RDD<String> textFile = sc.textFile(filePath, 2); JavaRDD<String> javaRDD = textFile.toJavaRDD(); </code></pre> Is this wide transformation or narrow ? What is the difference between JavaRDD and RDD ?

There's no significant performance penalty - <code>JavaRDD</code> is a simple wrapper around <code>RDD</code> just to make calls from Java code more convenient. It holds the original <code>RDD</code> as its member, and calls that member's method on any method invocation, for example (from JavaRDD.scala): <pre class="prettyprint"><code>def cache(): JavaRDD[T] = wrapRDD(rdd.cache()) </code></pre> <code>wrapRDD</code> boils down to something like <code>new JavaRDD[T](rdd)</code>, so the only performance penalty is creating a thin Java object for every method invocation, but that's entirely negligible as it's not done per element in the RDD, but once for the entire object.

Performance Impact of RDD to JavaRDD conversion

Tags:

java

scala

apache-spark

rdd

I have a code something like this and I want to work on JavaRDD instead of RDD. So, I'm doing conversion here. I would like to know the performance impact of this transformation specially when I'm dealing with GBs of data.

RDD<String> textFile = sc.textFile(filePath, 2);
JavaRDD<String> javaRDD = textFile.toJavaRDD();

Is this wide transformation or narrow ? What is the difference between JavaRDD and RDD ?

874

asked May 28 '16 09:05

Balaji Reddy

1 Answers

There's no significant performance penalty - JavaRDD is a simple wrapper around RDD just to make calls from Java code more convenient. It holds the original RDD as its member, and calls that member's method on any method invocation, for example (from JavaRDD.scala):

def cache(): JavaRDD[T] = wrapRDD(rdd.cache())

wrapRDD boils down to something like new JavaRDD[T](rdd), so the only performance penalty is creating a thin Java object for every method invocation, but that's entirely negligible as it's not done per element in the RDD, but once for the entire object.

158

answered Sep 22 '22 08:09

Tzach Zohar

Related questions
                            
                                Nashorn ScriptObjectMirror JS -> Java type conversion
                            
                                What is a class constant?
                            
                                Center an object in BorderPane
                            
                                How do I mitigate Connection leak triggered for connection com.mysql.jdbc.JDBC4Connection@11d08960,
                            
                                Why regular expression ((x,y)|(x,z)) is nondeterministic?
                            
                                Spring Boot SSL TCPClient ~ StompBrokerRelayMessageHandler ~ ActiveMQ ~ Undertow
                            
                                PMD UselessParentheses violation
                            
                                Java Reflection: Invoking Setter and Getter method for collection type Object
                            
                                Difference between parallel stream and CompletableFuture
                            
                                Default connection pool for tomcat in spring-boot?
                            
                                Minimal code to reliably store java object in a file
                            
                                Get the args in Application.launch(Class, String... args)
                            
                                Build is Success but No sources to compile
                            
                                git pull add, commit and push in java [closed]
                            
                                A null value cannot be assigned to a primitive type error (Spring/Hibernate)
                            
                                SimpMessagingTemplate not sending messages in spring boot
                            
                                java.lang.NoClassDefFoundError: okhttp3.OkHttpClient$Builder
                            
                                Divide an uneven number between Threads
                            
                                How to modify BuildConfig.java on my Android project?
                            
                                Read comma separated properties with configuration2 in java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With