Clone/Deep-Copy a Spark DataFrame

1 Answers

Dataframes are immutable. That means you don't have to do deep-copies, you can reuse them multiple times and on every operation new dataframe will be created and original will stay unmodified.

For example:

val df = List((1),(2),(3)).toDF("id")

val df1 = df.as("df1") //second dataframe
val df2 = df.as("df2") //third dataframe

df1.join(df2, $"df1.id" === $"df2.id") //fourth dataframe and df is still unmodified

It seems like a waste of resources, but since all data in dataframe is also immutable, then all four dataframes can reuse references to objects inside them.

198

answered Nov 22 '22 07:11

Krzysztof Atłasik

Related questions
                            
                                What does the arrow in an import statement do?
                            
                                Testing Play + Slick app
                            
                                Is it possible to use implicit parameters when defining routing directives?
                            
                                Reducing potentially empty RDD's
                            
                                I find myself reversing accumulators at the end of most functions; how can I stop?
                            
                                Read file on remote machine in Apache Spark using ftp
                            
                                Scalaz Type Classes for Apache Spark RDDs
                            
                                Kotlin zipAll alternative
                            
                                Scala Map and ConcurrentHashMap throw a java.lang.UnsupportedOperationException
                            
                                Scala case class ignoring import in the Spark shell
                            
                                Akka Actor Test: Automatic reply with a TestProbe
                            
                                Conditional Join in Spark DataFrame
                            
                                Scala's for comprehension for Futures and Options
                            
                                are there advantages for using value class (without methods) vs type alias?
                            
                                replace a while loop with Future in scala
                            
                                How to synchronize Intellij and sbt builds for a scala project
                            
                                Load performance testing with Gatling and Content-Type
                            
                                SBT: Watch source and test source changes at the same time
                            
                                How to get table names from SQL query?
                            
                                sbt server fails to start throwing error 231

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Clone/Deep-Copy a Spark DataFrame

Tags:

scala

apache-spark

apache-spark-sql

WestCoastProjects

People also ask

1 Answers

Krzysztof Atłasik

Recent Activity

Donate For Us