I'm just wondering what is the difference between an RDD
and DataFrame
(Spark 2.0.0 DataFrame is a mere type alias for Dataset[Row]
) in Apache Spark?
Can you convert one to the other?
DataFrames are a SparkSQL data abstraction and are similar to relational database tables or Python Pandas DataFrames. A Dataset is also a SparkSQL structure and represents an extension of the DataFrame API. The Dataset API combines the performance optimization of DataFrames and the convenience of RDDs.
3.14. RDD – RDD API is slower to perform simple grouping and aggregation operations. DataFrame – DataFrame API is very easy to use. It is faster for exploratory analysis, creating aggregated statistics on large data sets. DataSet – In Dataset it is faster to perform aggregation operation on plenty of data sets.
Using the Spark DataFrame API A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R or in the Python pandas library.
Dataset API is a set of operators with typed and untyped transformations, and actions to work with a structured query (as a Dataset) as a whole.
A DataFrame
is defined well with a google search for "DataFrame definition":
A data frame is a table, or two-dimensional array-like structure, in which each column contains measurements on one variable, and each row contains one case.
So, a DataFrame
has additional metadata due to its tabular format, which allows Spark to run certain optimizations on the finalized query.
An RDD
, on the other hand, is merely a Resilient Distributed Dataset that is more of a blackbox of data that cannot be optimized as the operations that can be performed against it, are not as constrained.
However, you can go from a DataFrame to an RDD
via its rdd
method, and you can go from an RDD
to a DataFrame
(if the RDD is in a tabular format) via the toDF
method
In general it is recommended to use a DataFrame
where possible due to the built in query optimization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With