What is strongly-typed API and an untyped API with respect to Spark Datasets ? How Datasets are similar/dissimilar to DataFrames?

Dataframe API's are untyped API's since the type will only be known during the runtime. Whereas dataset API's are typed API's for which the type will be known during the compile time. <pre class="prettyprint"><code>df.select("device").where("signal > 10") // using untyped APIs ds.filter(_.signal > 10).map(_.device) // using typed APIs </code></pre>

Spark Datasets - strong typing

1 Answers

Dataframe API's are untyped API's since the type will only be known during the runtime. Whereas dataset API's are typed API's for which the type will be known during the compile time.

Click to copy

df.select("device").where("signal > 10")      // using untyped APIs   
ds.filter(_.signal > 10).map(_.device)         // using typed APIs

110

answered Nov 10 '22 15:11

Vignesh I

Related questions
                            
                                Spark Streaming on a S3 Directory
                            
                                Spark Cassandra connector filtering with IN clause
                            
                                How to do performance profiling of Hadoop cluster
                            
                                Spark mllib predicting weird number or NaN
                            
                                Is HDFS necessary for Spark workloads?
                            
                                How to use window functions in PySpark using DataFrames?
                            
                                How to include spark tests as Maven dependency
                            
                                dataframe filter gives NullPointerException
                            
                                spark finding max value and the associated key
                            
                                Direct Kafka Stream with PySpark (Apache Spark 1.6)
                            
                                Convert Scala expression to Java 1.8
                            
                                How to set partition for Window function for PySpark?
                            
                                Kafka topic partition and Spark executor mapping
                            
                                Fetch spark job jar from Nexus
                            
                                Date Arithmetic with Multiple Columns in PySpark
                            
                                get topic from kafka message in spark
                            
                                Can sparklyr be used with spark deployed on yarn-managed hadoop cluster?
                            
                                Transforming PySpark RDD with Scala
                            
                                run spark as java web application
                            
                                Pyspark - how to do case insensitive dataframe joins?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark Datasets - strong typing

Tags:

dataset

apache-spark

apache-spark-dataset

Arvind Kumar

People also ask

1 Answers

Vignesh I

Recent Activity

Donate For Us